FHIR Provenance & Document Traceability — Steer Document¶
Date: 2026-04-10 (Session 35)
Status: Spec complete — not yet implemented
Companion spec: ../fhir-provenance-feature.md
1. Problem Statement¶
When a patient uploads a blood work PDF, the Clinical Context Agent
extracts conditions, lab values, and medications as FHIR resources.
But the link between the FHIR resource and the source document is
lost — document_id is available in document_processing.py but
never passed to the agent. The result:
- Patient sees "Prediabetes" in their EHR but not "from: blood_work.pdf"
- Providers receiving the EHR can't verify which document supports which finding
- If a patient deletes a document, its derived FHIR resources aren't affected (GDPR cascade gap)
- If the same PDF is re-processed, duplicate FHIR resources are created with no dedup by document source
- Chat-extracted data ("I take metformin") has no message_id or timestamp attribution
- When records are forwarded to providers, there's no audit trail of exactly what was sent
2. Decision: Dual Provenance — Column + FHIR Resource¶
Decision: Add document_id and case_id columns to the
FHIRResource model for fast internal queries, AND create FHIR R4
Provenance resources for standards-compliant export.
Why both¶
| Need | Column solves | Provenance resource solves |
|---|---|---|
| "Which document produced this finding?" | WHERE document_id = X |
Provenance.entity.what = DocumentReference/X |
| FHIR-standard export for providers | No — custom extension | Yes — any FHIR system reads it |
| Fast internal queries (EHR rebuild) | Yes — indexed column | No — requires join |
| GDPR cascade delete | WHERE document_id = X → delete |
Provenance chain traversal |
| Audit trail | Event table logs | Provenance.recorded timestamp |
Why not just one¶
- Column-only: fast but non-standard. When we export to a hospital's FHIR server, they can't read our custom columns.
- Provenance-only: standards-compliant but expensive to query internally. Every EHR rebuild would require joining through Provenance resources.
Design principles¶
- Every FHIR resource traces to its source — document_id for OCR, message_id for chat, "manual" for REST API
- No orphan resources — deleting a document cascades to its
derived FHIR resources (mark as
entered_in_error, don't hard-delete) - Dedup by source — re-processing the same document supersedes previous FHIR resources from that document, not duplicates
- Provider sees source — EHR export includes Provenance resources linking each finding to the source document name + date
- Chat data attributed — medications/allergies from conversation linked to message_id + timestamp
3. Scope¶
In scope¶
document_id+case_idcolumns on FHIRResource- Pass
document_idthrough the document processing pipeline to the Clinical Context Agent - FHIR Provenance resources created alongside clinical resources
- Source attribution in EHR snapshot (for frontend display)
- Document deletion cascade (mark derived FHIR as entered_in_error)
- Chat extraction source attribution (message_id, timestamp)
- Dedup: re-processing same document supersedes old FHIR resources
- Data forwarding audit table
Out of scope (future)¶
- FHIR R4 DocumentReference resource for the raw PDF (proper FHIR document indexing — separate from our DocumentReference model)
- Extraction confidence scores (needs LLM chain-of-thought parsing)
- Provider-side FHIR ingestion API
- Cross-border jurisdiction-specific consent (per-provider consent granularity)
4. Affected Surfaces¶
Backend¶
app/models/fhir_resource.py— 2 new columnsapp/agents/clinical_context.py— state schema + store_resourcesapp/services/document_processing.py— pass document_id to agentapp/services/fhir_service.py— create schema + dedup logicapp/services/ehr_rebuild_service.py— source map in snapshotapp/services/chat_extractor.py— source attributionapp/routers/fhir.py— optional document_id in create APIapp/models/consent.py— data categories + provider_id- New:
app/models/data_forwarding_audit.py
Frontend¶
EHRPanel.tsx— show source document per findingFullEHRDrawer.tsx— source tags on conditions/observations- Document list — link from document to its extracted findings
5. Risks¶
| Risk | Mitigation |
|---|---|
| Migration on fhir_resources table with existing data | document_id nullable, server_default=None. Backfill optional. |
| Provenance resources double storage | Provenance is small (~200 bytes each). At 100 resources per patient, adds ~20KB. |
| Performance of dedup query on re-process | Index on (patient_id, document_id, resource_type). Single query. |
| Chat extraction message_id not available | Pass message_id from case_orchestrator to chat_extractor. Small plumbing change. |
6. References¶
- FHIR R4 Provenance: https://hl7.org/fhir/R4/provenance.html
- Session 35 audit: 6 gaps identified in document-to-FHIR traceability
- Gap report finding #17: no FHIR confidence field
- Porting spec (PR #85):
meta.extensionfor case attribution