# Palimpsest

> Inhabit a place at a moment in the past, reconstructed from whatever evidence survives — and always shown plainly where the evidence ends and reconstruction begins.

**Status: early proof-of-concept.** The format is taking shape and the reference tools are nascent. Expect sharp edges and breaking changes.

*A palimpsest is a surface written over through time, where the earlier text shows imperfectly through the later — which is the whole concept in one word.*

Palimpsest is an open format and a set of reference tools for assembling a **viewpoint-anchored, time-navigable, provenance-transparent** representation of a place from sparse, heterogeneous historical evidence: photographs, footprints, maps, recollections. Photographs are the first and hardest type to register; the model beneath is neutral about modality.

## The idea

A web archive can snapshot a page because the page *is* the record. A place in 1993 was never recorded as a place you could stand in — what survives is scattered projections of it: a map surveyed three years earlier, an aerial frame two years later, a handful of photographs pointing in unknown directions. Palimpsest treats those fragments as **attestations** and reconstructs the place around them, instead of presenting a tidy archive that hides how little is actually known.

The forward problem — too much modern data, too little attention — is compression. The backward problem — too little historical data, too many gaps — is reconstruction. Both run through the same prior: a model of how place-and-time is structured. The one discipline that keeps either true is the rule the whole project is built around:

> **Never let the prior masquerade as data.** What was attested and what was inferred must never be confused, anywhere in the system.

## What it is trying to prove

That a navigable place-at-a-moment can be assembled from scattered attestations and rendered so that evidence and inference are never confused — extendable by anyone through pull requests, and compelling **without VR or generative fill**.

It will have failed if a faithful, viewpoint-anchored representation of one well-documented public square across a few decades does not feel like standing somewhere. That is a cheap thing to find out, and worth finding out before building the expensive layers above it.

## How it works

### Attestations, not pins

Every asset is an *attestation*: a record located in space and time, carrying its own uncertainty and provenance. The core is deliberately modality-neutral — the fields every attestation shares whatever it is: where and how it relates to a place (a photograph is a *view of* one, a recollection is *about* one, an event *occurred at* one, a document *describes* one), the time it refers to, its provenance, its confidence, its status, and a stable URI. Each relation also tells a renderer how to place the thing: a photograph hangs at its frustum, a recollection by the feature it concerns, audio from its source, an event across a region and a span. Everything specific to a kind of evidence lives in a per-type facet hung off that core.

Photographs are the first type because they are the hardest to register, and their facet — `viewpoint` — carries the one new primitive immersion needs: *from where, and toward what, was this witnessed*. But `viewpoint` is the photograph facet, not the core. The format builds on the [Linked Places Format](https://github.com/LinkedPasts/linked-places-format) (JSON-LD and valid GeoJSON, built on GeoJSON-T's temporal `when`), which already carries temporally-scoped attributes with uncertainty, and adds the facets and the evidence discipline Linked Places leaves out.

Film and video extend that facet rather than replacing it. A `viewpoint` is really a *pose trajectory* — a path of positions and orientations through time, encoded with [OGC Moving Features](https://www.ogc.org/standard/movingfeatures/) (MF-JSON): a still photograph is the degenerate one-sample case, and a clip is the full case, a frustum that moves, pans and zooms along its own internal timeline nested inside its place on the historical one. Two things follow. Footage is the strongest input for automated registration, since motion parallax is what structure-from-motion feeds on — where archival film exists it can help build the 3D base rather than only populate it. And the same trajectory primitive is not only a camera: it also encodes a journey, a procession, a route, so an attestation's geometry generalises cleanly to a point, a region or a path — path-shaped memories, not just moving images.

### Linked Places + Viewpoint (LP-V)

LP-V adds a small modality-neutral set — `provenance`, a three-axis `confidence`, a typed `status`, a first-class `rights` block, and a typed place relation — plus the `viewpoint` facet, which is a pose trajectory: a single sample for a still, a path for moving images. Each record is expressed as a STAC Item that is also JSON-LD, so the same object is at once a crawlable catalogue entry and a self-describing semantic record:

```jsonc
{
  // @context is an ARRAY: the Linked Places context (vendored locally, so the
  // corpus parses offline with no third-party fetch), then the versioned LP-V context.
  "@context": [
    "https://palimplace.com/spec/context/vendor/linkedplaces-context-v1.1.jsonld",
    "https://palimplace.com/spec/context/lp-v/v0.jsonld"
  ],
  "type": "Feature",                           // a GeoJSON Feature that is also a STAC Item
  "stac_version": "1.0.0",
  "id": "piccadilly-gardens/1937/george-street",
  "collection": "piccadilly-gardens",
  "geometry": { "type": "Point", "coordinates": [-2.236917, 53.480959] },
  "bbox": [-2.236917, 53.480959, -2.236917, 53.480959],

  // STAC `properties` is the coarse, searchable index. `datetime` is DERIVED from `when`
  // (null for an interval, so the bracket fields carry the year). `created` is record-time.
  "properties": {
    "datetime": null,
    "start_datetime": "1937-06-02T00:00:00Z",
    "end_datetime": "1937-06-02T23:59:59Z",
    "created": "2026-06-21T16:00:00Z",
    "title": "Piccadilly Gardens from George Street"
  },

  // --- LP-V core: top-level members, foreign to STAC, never inside `properties` ---
  "when": { "timespans": [{ "start": { "in": "1937-06-02" }, "end": { "in": "1937-06-02" } }], "certainty": "certain" },  // referent-time
  "status": "trace",                           // trace | testimony | derived | synthetic
  "placeRelation": "viewOf",                   // viewOf | about | occurredAt | describes — tells a renderer how to hang it

  // viewpoint = the photograph facet, an OGC MF-JSON MovingFeature (a still is one sample).
  "viewpoint": {
    "type": "MovingFeature",
    "temporalGeometry": {                      // position over time
      "type": "MovingPoint",
      "datetimes": ["1937-06-02T11:59:59Z"],
      "coordinates": [[-2.23735, 53.4806, 1.6]],   // lng, lat, eye-height (m). Pose ILLUSTRATIVE pending survey.
      "interpolation": "Discrete"
    },
    "temporalProperties": [{                   // orientation + intrinsics over time
      "datetimes": ["1937-06-02T11:59:59Z"],
      "orientation": { "values": [[0.653281, -0.270598, -0.270598, 0.653281]], "interpolation": "Discrete" },  // quaternion (x,y,z,w); glTF camera basis in local ENU. Euler is a human-only view.
      "hFovDeg": { "values": [55.0], "interpolation": "Step" }
    }],
    "poseConfidence": 0.45,
    "poseMethod": "manual"                     // manual | pnp | sfm
  },

  // media references the source RECORD page; `format` is what the uri returns (HTML),
  // since the bytes are referenced at the host, not bundled. Index points, host owns.
  "media": { "uri": "https://images.manchester.gov.uk/collections/246e9705-...", "type": "image", "format": "text/html" },
  "provenance": { "source": "Manchester City Council, City Engineers Department", "archive": "Manchester Local Image Collection", "accession": "m04003", "contributor": "github:handle" },
  "confidence": { "temporal": 0.97, "spatial": 0.45, "source": 0.85 },   // three axes, never collapsed

  // rights — one block, three jobs (search, bundle-vs-reference, AI-use). `statement` is the copyright
  // STATUS (rightsstatements.org); `license` is the LICENCE only if one truly applies (SPDX). The archive's
  // "CC BY 4.0" here is an orphan-works self-assertion, so this stays In Copyright and reference-only.
  // tdmReservation is the TDMRep simple form: 1 = text-and-data-mining (incl. AI training) reserved.
  "rights": {
    "statement": "http://rightsstatements.org/vocab/InC/1.0/",
    "holder": "Manchester City Council / Manchester Libraries",
    "credit": "Image courtesy of Manchester Libraries",
    "tdmReservation": 1
  },

  // STAC machinery that makes the catalogue crawlable with no server. Structural links are
  // RELATIVE, so a clone or fork crawls offline and relocates without a record rewrite.
  "links": [
    { "rel": "self", "href": "./george-street.json", "type": "application/geo+json" },
    // ... root "../../catalog.json", parent + collection "../collection.json"
    { "rel": "via", "href": "https://images.manchester.gov.uk/collections/246e9705-...", "type": "text/html" }
  ],
  "assets": {                                  // the source landing page, modelled honestly (a page, not bytes)
    "source": { "href": "https://images.manchester.gov.uk/collections/246e9705-...", "type": "text/html", "roles": ["overview"] }
  }
}
```

That is a real, CI-validated record — `data/piccadilly-gardens/1937/george-street.json`, shown here truncated and commented. The viewpoint facet follows the standard pinhole camera model and ingests EXIF where it exists — focal length, and sometimes GPS and orientation — so a photograph partly registers itself rather than starting from nothing.

The three confidence axes are kept separate on purpose: a sharply-dated photograph of uncertain location is a different epistemic object from a vaguely-dated one nailed to a known wall — and for testimony the axes shift toward the witness, reliability and corroboration, rather than toward pose.

`rights` is a first-class block rather than a buried string, because it is both a primary filter for agents and the thing that drives hosting. It references controlled vocabularies — [rightsstatements.org](https://rightsstatements.org) for the full cultural-heritage range (in-copyright, orphan work, rights-holder unlocatable, public domain, and the rest), an SPDX or Creative Commons identifier where an actual licence applies — and may carry an [ODRL](https://www.w3.org/TR/odrl-model/) policy of permitted and prohibited actions. From that one block an agent filters to what it may use, CI bundles only what may be redistributed and references the rest, and an AI/text-and-data-mining reservation — carried in the [TDMRep](https://www.w3.org/community/tdmrep/) simple form (`tdmReservation`) — is honoured at the record level — search, the bundle-or-reference decision, and the AI-use symmetry all reading the same field.

`when` is *referent-time*: the moment a record refers to, kept distinct from the *record-time* it was made, which STAC already carries as `created`. For a photograph the two coincide; a recollection set down decades later refers to one time and was recorded at another, and the distance between them is itself a confidence input — a memory fixed the same week is stronger evidence than one fixed fifty years on.

`status` names the record's relationship to its referent: `trace` for a direct physical record like a photograph or a scan, `testimony` for a primary account of a claim like a recollection or a caption, `derived` for anything computed such as a solved pose or a transcription, `synthetic` for anything generated. The first two are evidence and the last two inference; `derived` and `synthetic` records carry a PROV-O chain naming what produced them, so nothing machine-made can be read as a primary source. On disk each record is a STAC Item, which makes the whole corpus a crawlable catalogue (see [Machine readability](#machine-readability)).

LP-V stays valid JSON-LD and valid GeoJSON, so it renders in ordinary web mapping out of the box and interoperates with the wider gazetteer ecosystem, and its `@context` is versioned and dereferenceable so every term resolves to a definition with declared units and reference frames. It extends existing standards rather than minting one.

### The reference viewer

Given a `FeatureCollection` of LP-V attestations for one locality, the viewer:

- shows a present-day 3D base of the place;
- hangs each attested photograph as a textured frustum at its viewpoint, with a **"stand here"** footprint — the image is sharpest at the original optical centre and visibly shears as you step off it, so the optics themselves mark where the attestation is strong;
- renders everything not attested as **explicit fog plus known footprints** (extruded from OSM/OS data as plain matte volumes, explicitly not photographic). The v0 "reconstruction" involves no generation — only attested geometry shown in a distinct material register;
- provides a **non-uniform time scrubber** with evidence density baked into the axis, so moving through time feels like surfacing at islands of attestation and crossing fog between them. "Now" is simply the densest slice.

Web-first. The scene graph is kept portable for XR later, but a browser is where contribution happens.

### The contributor tool

Recovering a `viewpoint` for a historic photograph is the hard input. v0 does not attempt automated structure-from-motion. A contributor loads an old photo beside the present 3D scene and either aims a camera frustum until the image lines up, or clicks a few persistent correspondences (a window-corner then, the same corner now) and the tool solves Perspective-n-Point for pose. The output is a valid LP-V record. Automated and learned matching can arrive later against the same schema.

## User journeys

What the design enables, from the POC outward:

**Contribute a memory.** You have an old photograph of a place you knew. You load it into the contributor tool beside the present-day scene, line up a few persistent features, and the tool solves its vantage. You open a pull request, CI validates it, and it joins the corpus as an attestation anyone can stand inside.

**Stand somewhere and look back.** You open the viewer at the square and scrub the timeline back to the 1930s. The photograph hangs at the spot it was taken — sharp at the vantage, dissolving into fog at the edges of what was witnessed — and as you move forward through the decades you watch the gardens change.

**Share a place-in-time as a link.** Every vantage has a URI. You paste it into a chat — *this is where my dad's shop was, 1993* — and whoever opens it lands exactly there, at that moment, looking the way you were looking. The memory is a link: portable, permanent, resolvable by a person or an agent alike.

**Ask an agent to research a place.** You ask an agent what your grandmother's street looked like when she lived there in the 1970s. It queries the catalogue across that place and window, weighs the attestations by confidence and provenance, sets aside the thin and the contested, and returns a curated vantage or a short tour of URIs — grounded only in attested fragments, because the corpus tells it what is evidence and what is not. The agent does the deciding-what-matters; the design keeps it from inventing.

**Let an agent contribute at scale.** An automated pipeline ingests a batch of archive photographs, recovers their vantages by structure-from-motion, and proposes them as attestations marked `poseMethod: sfm` and `status: derived`. CI accepts them as inference, correctly labelled, never silently promoted to evidence.

## The dataset is a Git repository

The dataset is a Git repository of LP-V GeoJSON. There is no backend: the `FeatureCollection` *is* the dataset, contributions are pull requests, and validation is JSON-Schema in CI. This keeps the data version-controlled, forkable, reviewable and attributable — and it makes lock-in structurally impossible, because every contributor holds a complete copy and a fork is `git clone`. A real spatiotemporal backend belongs later, only once this substrate is outgrown.

## Machine readability

If the project works, most of its readers will be agents rather than people, so the corpus is built to be parsed without ever reading this page. Four decisions carry that, with one optional convenience on top:

- **Self-describing terms.** The JSON-LD `@context` is a versioned, dereferenceable vocabulary in which each field maps to a term, the place geometry carries through to RDF as GeoJSON-LD, and the reference frames are stated — EPSG:4326 for position, a glTF camera basis in a local ENU frame with `(x,y,z,w)` quaternions for orientation, the uncertainty-bearing `when` for time. It crosswalks to Dublin Core today; QUDT units (metre, degree), OWL-Time grounding, and schema.org / CIDOC-CRM (with CRMgeo) crosswalks are declared as the intended vocabulary and land in v1 — mapped rather than modelled in, to keep the core light. An agent never has to guess a convention a person would have filled in, and the ambiguity people resolve silently is exactly the kind that breaks machines.
- **A static STAC catalogue.** Each attestation is a STAC Item, so an agent crawls the whole corpus by following `rel` links — no server, no key, no chokepoint. STAC's datetime is the coarse index; Linked Places' `when` carries the precise, uncertainty-bearing temporal truth. Imagery rides on IIIF, lineage on PROV-O.
- **Dereferenceable URIs.** Every attestation, place and viewpoint resolves at a stable URI that content-negotiates — JSON-LD to an agent, a rendered view to a person, the same resource either way. A URI is therefore a portable, permanent reference to an exact place, time and vantage, and the unit a memory is shared in.
- **The seam enforced in CI.** The schema makes provenance, confidence and `status` mandatory: `trace` and `testimony` must name a source, `derived` and `synthetic` must carry a PROV chain naming what produced them, and a record claiming to be evidence without one is rejected. CI also checks that each STAC `datetime` is derived from the record's `when`, so the catalogue's coarse clock can never drift from the truth. Because agents will contribute too, *never let the prior masquerade as data* has to be a check, not a courtesy.

An optional **MCP server** can sit over the static files as an agent-native query-and-propose interface — a convenience anyone may run, never the source of truth, so the canonical files remain the thing no one can capture.

## Scope

v0 is deliberately narrow. The following are **not** in it:

- **VR / XR** — web-first; the renderer is kept portable, not built for headsets yet.
- **Generative reconstruction** — the synthetic register is explicit fog plus footprints; generative fill is specified as a pluggable module and left unbuilt, so the seam stays legible by default.
- **Automated pose recovery** — the manual contributor tool only.
- **A backend** — flat files and Git.
- **Multiple localities** — a single seed locality (Piccadilly Gardens, Manchester city centre, across its eras — the Royal Infirmary, the sunken Victorian gardens, the 1960s Piccadilly Plaza and Tadao Ando's 2002 redesign — is the working reference). Oxford Road and St Peter's Square are noted as later localities.

## Roadmap

In rough order of increasing cost and value: automated pose recovery (learned feature matching against the 3D base, with archival film a strong bootstrap); the generative-fill module, always rendered in the distinct synthetic register so the seam stays legible; further attestation types beyond the photograph facet — film and video as moving viewpoints, spatial audio as positioned sound, documents as readable planes; conflicting-attestation handling that holds contested memories side by side rather than resolving them; an endorsement and reputation layer (described below) that lets trust accrue without a central score; an XR renderer; a spatiotemporal backend once Git-as-database is outgrown; and, only if scale makes the integrity of provenance worth proving to strangers, an optional anchor of record hashes to a public timestamping ledger — tamper-evident provenance that needs no trust in the commons operator (Git already gives the hash chain; this makes it independently verifiable).

## Endorsement and reputation

The seam separates what is attested from what is inferred; the same discipline separates *who vouched for a record* from the record itself. A pose recovered today is rarely surveyed — it may be a human's aim, a Perspective-n-Point solve, or, as across the seed corpus, an AI's estimate reasoned from the photograph's own words. The format already names which — `viewpoint.poseMethod` carries `ai-estimated` alongside `manual`, `pnp` and `sfm`, with an honest `poseConfidence` and the reasoning kept in a `poseNote` — so an estimate can never quietly pass for a measurement, and the photograph stays evidence even while its vantage is owned as inference. Endorsement is the next ring out: a registration can be *reviewed*, confirmed or disputed, by another party — human or AI.

An endorsement is modelled the way provenance already is: a signed assertion — "agent X reviewed record Y on date Z; confirms" — attributed to a stable, self-owned identity (a signed commit, an ORCID-style handle, an AI agent's key) and carried in the record as a first-class, auditable claim. Confidence then stops being a number set by hand and becomes a *function* of method, endorsements, and the standing of those who gave them: an AI estimate starts low, independent confirmation raises it, a control-point solve stands high on its own.

The one rule that stops this becoming the very chokepoint the rest of the project forbids: **reputation is a derived view, never a stored score.** It is computed from the public history — how much of a contributor's work survived review, was independently confirmed, went unchallenged — and so is recomputable, and *re-weightable*, by anyone holding a copy of the commons. There is no karma database, no authority's number to capture, game, or have to trust; a fork that disputes the weighting recomputes its own from the same history. Identities are pseudonymous but stable, endorsements are public and signed, and the standing they build is visible working rather than a verdict — the seam, *never let the prior masquerade as data*, extended from claims to the people who make them.

This is past v0: it means nothing until there are many contributors and a review flow, and it is specified, not built. But the structural choice is made now, with the other irreversible ones — reputation that lives in the open history, computable by all and owned by none.

## Commitments and governance

The value and the trust both live in the contributed corpus, so one question governs everything: can a contributor be certain their material can never be taken from them? Openness today does not answer it — every enclosure began open. The commitments below are therefore structural, binding by construction rather than by promise:

- **Contributed data is irrevocably open** — CC0, or CC-BY where attribution is wanted. The corpus is already free even in the worst future, so enclosure is impossible by construction rather than forbidden by pledge.
- **No contributor agreement assigns rights to any private entity.** Inbound licence equals outbound licence. The company-owned CLA is the precise lever every relicensing has used; this project does not have one.
- **No proprietary account holds contributions hostage.** Attribution rides in the provenance field via standard identity — signed commits, an ORCID-style handle — not a login under anyone's control.
- **Use is symmetric, including for AI.** Anyone may use the open corpus; no one may exclusively enclose it. That is the only credible answer to the fear that contributed heritage quietly becomes one party's training data.

The single principled exception to maximal openness cuts the other way: the **name and the reference specification are held in trust** — a protected mark and a canonical spec — used solely to stop a bad actor shipping a malicious fork under the trusted name, never to extract.

No governing foundation exists yet, and standing one up prematurely would be theatre. The irreversible, cheap choices are made now — open data, no CLA, open format, name held for trust — so the project is foundation-compatible from the first commit, and a neutral steward can be formed only if there is ever something real to hold.

## Privacy and hosting

Palimpsest takes no position on access control, and does not need to. Because the format is the product and the public commons is only one deployment of it, privacy is a property of *where a record is hosted*, not of the record. A private memory is the same LP-V record kept in a store its owner controls — a private repository, a personal server — behind whatever authentication they choose, with URIs that resolve only for those with access. The design choice that makes the public commons un-enclosable is the same one that makes private memory free: run the format on a private host, and nothing changes but who can reach it.

**The commons is an index, not a repository.** It holds attestation records — geometry, time, provenance, confidence, status, and a URI *pointing at* the media — while the media itself stays at its source: an IIIF manifest, an archive URL. (Bytes enter the corpus only for content the `rights` block marks as cleared and openly licensed; everything else is referenced — a decision CI makes from the field, not by hand.) This is where the right to erasure is handled by architecture rather than by policy: a request to remove the *content* goes to the host that holds it, whose removal makes the URI dereference to nothing and the material leave every experience at once — without the commons ever being the controller of a single byte. The index points; the host owns.

Two edges remain, worth stating plainly.

The public commons is **permanent**. The commitments that stop anyone enclosing the corpus also stop anyone un-publishing it: a record contributed to the open commons is world-visible and already on every fork, and cannot be recalled. Contributing is publication, not storage — anything sensitive belongs on a private host, decided before the commit.

A memory is **not only about its contributor**, and this is the one place the carve-out has a limit: the *record itself* can be personal data even when no bytes are hosted — a recollection naming a living neighbour, a caption identifying someone. That data lives in the LP-V record, which sits in the immutable, forked commons and can be neither erased nor un-forked. So the residual privacy duty narrows to record-borne personal data — testimony and named living individuals — which the commons leans against ingesting in the first place, precisely because it cannot be undone. The format stays agnostic about enforcement but can still *carry* a sensitivity or consent signal, the same way it carries provenance and confidence, so hosts and agents can act on it without Palimpsest deciding for them.

## Licence

- **Code** (format tooling, reference viewer, contributor tool): Apache-2.0.
- **Data**: CC0, or CC-BY where attribution is requested.
- **Name and specification**: held in trust, as described above.

## Contributing

Contributions are pull requests against the dataset or the tools. To add a historic photograph, register its viewpoint with the contributor tool and open a PR with the resulting LP-V record; CI validates it against the schema. Issues proposing extensions to LP-V are welcome — the format is the heart of the project, and it is meant to be argued over.

---

*The format is the contribution. Everything else is a reference implementation that proves it works and invites better ones.*
