FHIR Provenance & Document Traceability — Steer Document¶

Date: 2026-04-10 (Session 35) Status: Spec complete — not yet implemented Companion spec: ../fhir-provenance-feature.md

1. Problem Statement¶

When a patient uploads a blood work PDF, the Clinical Context Agent extracts conditions, lab values, and medications as FHIR resources. But the link between the FHIR resource and the source document is lost — document_id is available in document_processing.py but never passed to the agent. The result:

Patient sees "Prediabetes" in their EHR but not "from: blood_work.pdf"
Providers receiving the EHR can't verify which document supports which finding
If a patient deletes a document, its derived FHIR resources aren't affected (GDPR cascade gap)
If the same PDF is re-processed, duplicate FHIR resources are created with no dedup by document source
Chat-extracted data ("I take metformin") has no message_id or timestamp attribution
When records are forwarded to providers, there's no audit trail of exactly what was sent

2. Decision: Dual Provenance — Column + FHIR Resource¶

Decision: Add document_id and case_id columns to the FHIRResource model for fast internal queries, AND create FHIR R4 Provenance resources for standards-compliant export.

Why both¶

Need	Column solves	Provenance resource solves
"Which document produced this finding?"	`WHERE document_id = X`	`Provenance.entity.what = DocumentReference/X`
FHIR-standard export for providers	No — custom extension	Yes — any FHIR system reads it
Fast internal queries (EHR rebuild)	Yes — indexed column	No — requires join
GDPR cascade delete	`WHERE document_id = X` → delete	Provenance chain traversal
Audit trail	Event table logs	Provenance.recorded timestamp

Why not just one¶

Column-only: fast but non-standard. When we export to a hospital's FHIR server, they can't read our custom columns.
Provenance-only: standards-compliant but expensive to query internally. Every EHR rebuild would require joining through Provenance resources.

Design principles¶

Every FHIR resource traces to its source — document_id for OCR, message_id for chat, "manual" for REST API
No orphan resources — deleting a document cascades to its derived FHIR resources (mark as entered_in_error, don't hard-delete)
Dedup by source — re-processing the same document supersedes previous FHIR resources from that document, not duplicates
Provider sees source — EHR export includes Provenance resources linking each finding to the source document name + date
Chat data attributed — medications/allergies from conversation linked to message_id + timestamp

3. Scope¶

In scope¶

document_id + case_id columns on FHIRResource
Pass document_id through the document processing pipeline to the Clinical Context Agent
FHIR Provenance resources created alongside clinical resources
Source attribution in EHR snapshot (for frontend display)
Document deletion cascade (mark derived FHIR as entered_in_error)
Chat extraction source attribution (message_id, timestamp)
Dedup: re-processing same document supersedes old FHIR resources
Data forwarding audit table

Out of scope (future)¶

FHIR R4 DocumentReference resource for the raw PDF (proper FHIR document indexing — separate from our DocumentReference model)
Extraction confidence scores (needs LLM chain-of-thought parsing)
Provider-side FHIR ingestion API
Cross-border jurisdiction-specific consent (per-provider consent granularity)

4. Affected Surfaces¶

Backend¶

app/models/fhir_resource.py — 2 new columns
app/agents/clinical_context.py — state schema + store_resources
app/services/document_processing.py — pass document_id to agent
app/services/fhir_service.py — create schema + dedup logic
app/services/ehr_rebuild_service.py — source map in snapshot
app/services/chat_extractor.py — source attribution
app/routers/fhir.py — optional document_id in create API
app/models/consent.py — data categories + provider_id
New: app/models/data_forwarding_audit.py

Frontend¶

EHRPanel.tsx — show source document per finding
FullEHRDrawer.tsx — source tags on conditions/observations
Document list — link from document to its extracted findings

5. Risks¶

Risk	Mitigation
Migration on fhir_resources table with existing data	`document_id` nullable, `server_default=None`. Backfill optional.
Provenance resources double storage	Provenance is small (~200 bytes each). At 100 resources per patient, adds ~20KB.
Performance of dedup query on re-process	Index on `(patient_id, document_id, resource_type)`. Single query.
Chat extraction message_id not available	Pass message_id from case_orchestrator to chat_extractor. Small plumbing change.

6. References¶

FHIR R4 Provenance: https://hl7.org/fhir/R4/provenance.html
Session 35 audit: 6 gaps identified in document-to-FHIR traceability
Gap report finding #17: no FHIR confidence field
Porting spec (PR #85): meta.extension for case attribution