# LP-V v0 — design decisions

The non-obvious choices behind the LP-V v0 schema and `@context`, with citations, so the format can be argued with rather than reverse-engineered. **`README.md` is the specification; this is the rationale.** Where the README's prose and its printed example once disagreed, the prose won and the example was corrected (see §12).

## 1. One object, three models — GeoJSON + STAC + JSON-LD

An LP-V record is a single JSON object that is simultaneously a valid GeoJSON Feature, a STAC Item, and JSON-LD. The reconciliation rule, in one line:

> **STAC owns `properties.datetime`** (the coarse, derived index). **Linked Places / GeoJSON-T own the top-level `when`** and every feature attribute. **Every LP-V semantic field is a top-level member** (`when`, `status`, `placeRelation`, `viewpoint`, `media`, `provenance`, `confidence`, `rights`) — never inside `properties`, so nothing can collide with STAC's reserved slot.

This is legal because GeoJSON permits **foreign members** (RFC 7946 §6.1) and STAC declares that *"any JSON object that contains all the required fields is a valid STAC Item."* The README's instinct to keep semantics at the feature top level is therefore correct; what it had omitted was the STAC machinery (`stac_version`, `bbox`, `properties.datetime`, `links`, `assets`).

- RFC 7946 §6.1 — <https://datatracker.ietf.org/doc/html/rfc7946#section-6.1>
- STAC Item spec — <https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md>

## 2. `when` is the source of truth; `properties.datetime` is derived

`when` is *referent-time* (the moment depicted), uncertainty-bearing, per GeoJSON-T / Linked Places. STAC's `datetime` is a **lossy projection** of it, derived by rule (`scripts/lib/when.mjs`, used to both generate and verify):

- Expand each bound to an instant interval (`"1937"` → `1937-01-01T00:00:00Z … 1937-12-31T23:59:59Z`).
- If the record collapses to a single instant → set `datetime`.
- Otherwise (any interval, incl. year/month precision) → `datetime: null` **plus** `start_datetime` / `end_datetime` (STAC permits null only when both bracket fields are set).

CI re-derives and compares, so the coarse clock can never silently drift from the truth. An uncertain date can never masquerade as a precise instant.

- STAC common metadata (datetime rules) — <https://github.com/radiantearth/stac-spec/blob/master/commons/common-metadata.md>
- GeoJSON-T `when` — <https://github.com/kgeographer/geojson-t>

## 3. The seam, enforced as schema conditionals

The project's one non-negotiable — *never let the prior masquerade as data* — is machine-checked, not trusted, via `allOf` if/then rules in the schema. The seam bites on **content, not just presence** (an adversarial review found that an empty string satisfied a bare `required`):

1. `trace` / `testimony` ⇒ `provenance` must name a **non-blank** `source` or `archive` (a `nonBlankString` `$ref`, so `""` and whitespace fail).
2. `derived` / `synthetic` ⇒ `provenance.prov` must carry a PROV-O chain whose `used` / `wasDerivedFrom` is **non-empty** — inference cannot show its working by pointing at nothing.
3. **Bidirectional:** any record carrying `provenance.prov` *must* be `derived`/`synthetic` — generated content cannot wear an evidence label.
4. `placeRelation: viewOf` ⇒ a `viewpoint` is required, and the `viewpoint` facet only belongs on a `viewOf`.
5. An In-Copyright (or unevaluated) `statement` ⇒ `license` is forbidden — an optimistic open licence can never sit over a true In-Copyright status.
6. `properties.datetime: null` ⇒ `start_datetime`+`end_datetime` required.

`additionalProperties: false` closes every LP-V object, so a typo (notably the British-English `licence`/`license` slip, which would silently disarm the bundling gate) is **rejected, not silently dropped** in JSON-LD. Evidence names its origin; inference shows its working; nothing machine-made can read as a primary source. All proven by negative tests in `scripts/validate.mjs`.

## 4. `viewpoint` is an OGC MF-JSON `MovingFeature`

The photograph facet is a *pose trajectory*, not a static camera: `temporalGeometry` (a `MovingPoint` — position over time) + `temporalProperties` (orientation and intrinsics over time). A **still is the degenerate single-sample case**; film extends it unchanged. The same primitive also encodes a journey, a procession, a route.

- OGC Moving Features (MF-JSON) — <https://docs.ogc.org/is/19-045r3/19-045r3.html>

## 5. Orientation — quaternion in a local ENU frame, glTF camera basis

So an agent never has to guess (the README's own warning), the convention is pinned in the schema and `@context`:

- **Camera basis**: glTF / OpenGL — the camera looks down its local **−Z**, **+Y** up, **+X** right.
- **World frame**: a local **East-North-Up** tangent frame at the position (+X East, +Y North, +Z Up).
- **Component order**: `(x, y, z, w)`, scalar **w last**.
- Euler is a derived, human-only view; never canonical.

`scripts/lib/pose.mjs` converts a compass heading to this quaternion via a look-at construction; verified (270° → camera faces due West, `[0.5, 0.5, 0.5, 0.5]`).

- glTF 2.0 (camera basis, XYZW rotation) — <https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html>

## 6. `rights` is two fields — and why

`rights.statement` is the copyright **status** (rightsstatements.org); `rights.license` is the **licence** only when one truly applies (SPDX). They are deliberately separate because of what the seed taught us:

> Manchester Libraries stamps *"CC BY 4.0"* on images as an **orphan-works default over material it admits it may not own** — the visible label even hyperlinks to **CC BY-NC-SA 4.0**, with per-item ShareAlike/NonCommercial terms.

So an archive's optimistic licence label must never overwrite a true `InC` / `InC-RUU` status. The split lets `statement` stay honest (In Copyright) while `license` stays empty — and the **bundle-vs-reference gate reads `license`**: only a genuine open licence (e.g. the 2013 Geograph base, `CC-BY-SA-2.0`) clears bundling; everything else is referenced at source. At least one of `statement`/`license` is required (anyOf); seam rule 5 forbids a licence under an In-Copyright statement; and a cleanly-licensed work (the Geograph) carries `license` with **no** `statement` rather than a contradictory `InC`. CI enforces the gate over **every** asset (a reference-only record must point no asset at a bundled local path). `license` is deliberately a bare SPDX **literal token**, not a dereferenceable `LicenseDocument` resource, because it is the machine-gate key; CI validates it against the SPDX id list.

## 7. Controlled vocabularies

| Field | Vocabulary | Note |
|---|---|---|
| `rights.statement` | rightsstatements.org (12 URIs) | **`http:` scheme is canonical** — it is an identifier, not normalised to `https` |
| `rights.license` | SPDX identifiers (CI-validated) | `CC0-1.0`, `CC-BY-4.0`, `CC-BY-SA-2.0`, `Apache-2.0`, … |
| AI / TDM reservation | **TDMRep** simple form | `rights.tdmReservation` (`1` = reserved, `0` = not), optional `tdmPolicy` URL. *Not* an ODRL action in `prohibits` — `tdm:mine` is a TDMRep permission action, so a reservation is the flag, not a prohibition. The README's old `"share"`/`"train"` were never ODRL terms. |
| `rights.permits/prohibits` | ODRL 2.2 (optional) | full action URIs, e.g. the open Geograph base permits `odrl:distribute`, `odrl:reproduce` |
| `provenance.prov` | PROV-O | minimal chain: `wasGeneratedBy`+`used` (non-empty) or `wasDerivedFrom` |
| imagery | IIIF | `media.iiifManifest` (Presentation 3.0) / `media.iiifImageApi` (Image API). The Manchester platform exposes an Image API but no per-item manifest, so the seed leaves both empty for now. |

Units (QUDT `unit:M`/`unit:DEG`), OWL-Time grounding of `when`, and schema.org / CIDOC-CRM crosswalks are **declared intent for v1**, not yet wired in the v0 `@context` (which crosswalks to Dublin Core and maps the place geometry as GeoJSON-LD). The v0 context declares only prefixes it actually uses, so a reader is never misled that a crosswalk is live when it is not — the machine-readability prose in the README was narrowed to match.

- rightsstatements.org — <https://rightsstatements.org/page/1.0/> · SPDX — <https://spdx.org/licenses/> · ODRL — <https://www.w3.org/TR/odrl-vocab/> · TDMRep — <https://w3c-cg.github.io/tdm-reservation-protocol/spec/> · PROV-O — <https://www.w3.org/TR/prov-o/> · QUDT — <https://www.qudt.org/doc/DOC_VOCAB-UNITS.html> · OWL-Time — <https://www.w3.org/TR/owl-time/> · IIIF — <https://iiif.io/api/presentation/3.0/>

## 8. `@context` — JSON-LD 1.1 with scoped contexts

Two keys legitimately mean different things in different places, resolved with JSON-LD 1.1 **scoped contexts** rather than renamed:

- `coordinates` — GeoJSON `geometry.coordinates` vs MF-JSON `viewpoint…coordinates`; the latter is scoped under `viewpoint`.
- `source` — `provenance.source` (a string, `dcterms:source`) vs `confidence.source` (a number, the source-confidence axis); the latter is scoped under `confidence`.

STAC's `properties` wrapper is made transparent to JSON-LD with `"properties": "@nest"`, so `datetime`/`created` resolve at the feature level while STAC consumers still read the raw `properties` object. The place `geometry` is mapped to the GeoJSON-LD vocabulary (`geojson:geometry`/`geojson:coordinates`) so the spatial anchor — the most load-bearing claim in the format — actually survives into RDF rather than expanding to nothing.

The record's `@context` is an **array** `[Linked Places context, LP-V context]`, and both are **vendored locally** and served from `palimplace.com`: the Linked Places context is pinned in `spec/context/vendor/` rather than fetched at runtime from a moving upstream branch, so the corpus parses **offline with no third-party chokepoint** (the "runs from files" non-negotiable). CI resolves every context from disk and refuses network access.

## 9. Namespace — a single swappable constant

For v0 the `@id`/`@context` namespace is `https://palimplace.com/…` — the project's own domain, served via GitHub Pages — kept as a single constant. It is the one thing baked into every record, so owning the domain means the identifier outlives any hosting choice, and a future move (e.g. a **`w3id.org/palimplace`** permanent-identifier redirect, for host-independent permanence) is a one-line change, not a record rewrite.

## 10. `geometry` is the place anchor; the vantage lives in `viewpoint`

For historical photographs whose camera position is unsurveyed, `geometry` is set to the **place** (the gardens centre) — an honest, high-confidence claim — while the *illustrative* vantage sits in `viewpoint` with low `poseConfidence` and low spatial confidence. The 2013 Geograph base, which carries real camera coordinates, sets `geometry` to those coordinates. So the seam holds spatially too: the renderer learns where the attestation is strong and where it is a guess.

## 11. Validation & CI

`scripts/validate.mjs` (run by CI) is the enforcement core: ajv against the schema; the seam negative tests; JSON-LD expansion of each record **as it ships** (vendored contexts, offline) asserting the place geometry survives into RDF; and over **every `/data` record** — schema validity, `datetime`-derived-from-`when`, `start ≤ end`, the rights→bundling gate over every asset, SPDX-id validity, and the cross-field viewpoint invariants the schema cannot express (unit-norm quaternions, `viewpoint` datetimes within `when`, MF-JSON parallel-array lengths). CI also runs the **official `stac-node-validator`** over `/data` so STAC conformance regressions are caught, and the structural rel links are checked to be relative. Deep GeoJSON ring/winding validation remains the one piece deferred to a dedicated validator.

**Datetime is a documented lossy projection (§2).** A multi-period `when` (e.g. 1937 *and* 1965) and a bracketed-uncertainty `when` (e.g. earliest 1960 / latest 1969) both collapse to the convex-hull `start_datetime`..`end_datetime`; the precise structure and the `certainty` flag stay in `when`. The STAC index can therefore look like a definite span when the truth is uncertain — acceptable because `when` is the authority, but flagged so a contributor adding a disjoint or uncertain date knows the index is deliberately coarse.

## 12. Divergences from the README's original example (now reconciled)

The printed example was valid GeoJSON but **not** a valid STAC Item, contradicting the surrounding prose. The corrected example (and the real seed records) add: `stac_version`, `bbox`, a `properties` block with the derived `datetime`, `links`, `assets`; wrap `viewpoint` as MF-JSON; name `placeRelation`; make `@context` an array; and fix the rights block (`InC` not assumed `CC-BY`; the TDMRep `tdmReservation` flag not the non-vocabulary `"train"`/`"share"`). The illustrative coordinates also moved from the (illustrative) Cross St / Market St point to the surveyed Piccadilly Gardens locality.

## 13. The inference half is exercised, not just asserted

The seed corpus is twelve records, and two are deliberately not photographs-as-evidence: a **`testimony`** record (a dated recollection — `placeRelation: about`, no `viewpoint`, confidence read as witness/reliability/corroboration) and a **`derived`** record (a PnP-solved pose for the 1937 photograph — `status: derived`, `poseMethod: pnp`, carrying a PROV-O chain naming the source photo and the present-day base it used). So both sides of the seam are present in real data a reader can render, not merely in unit-test mutations. Both are flagged illustrative pending real sourcing.

## 14. Deferred to roadmap (named, not hidden)

These are scoped out of v0 and recorded so the gap between prose and artefact is never silent: a SKOS/RDFS vocabulary document so each `status`/`placeRelation` *value* IRI dereferences to a definition (today the field terms map to the context, but the value IRIs resolve to the schema namespace, not an ontology); QUDT unit datatypes, OWL-Time grounding, and schema.org / CIDOC-CRM crosswalks (§7); a declared STAC extension JSON so the LP-V foreign members are an announced extension rather than tolerated extras; thumbnail assets and direct IIIF Image-API byte references for the cleared items; and the richer geometry/`placeRelation`/media-type breadth (LineString journeys, `occurredAt` events, audio/document media) that the schema admits but the seed does not yet exercise.
