Skip to content

FHIR Provenance & Document Traceability — Steer Document

Date: 2026-04-10 (Session 35) Status: Spec complete — not yet implemented Companion spec: ../fhir-provenance-feature.md


1. Problem Statement

When a patient uploads a blood work PDF, the Clinical Context Agent extracts conditions, lab values, and medications as FHIR resources. But the link between the FHIR resource and the source document is lost — document_id is available in document_processing.py but never passed to the agent. The result:

  • Patient sees "Prediabetes" in their EHR but not "from: blood_work.pdf"
  • Providers receiving the EHR can't verify which document supports which finding
  • If a patient deletes a document, its derived FHIR resources aren't affected (GDPR cascade gap)
  • If the same PDF is re-processed, duplicate FHIR resources are created with no dedup by document source
  • Chat-extracted data ("I take metformin") has no message_id or timestamp attribution
  • When records are forwarded to providers, there's no audit trail of exactly what was sent

2. Decision: Dual Provenance — Column + FHIR Resource

Decision: Add document_id and case_id columns to the FHIRResource model for fast internal queries, AND create FHIR R4 Provenance resources for standards-compliant export.

Why both

Need Column solves Provenance resource solves
"Which document produced this finding?" WHERE document_id = X Provenance.entity.what = DocumentReference/X
FHIR-standard export for providers No — custom extension Yes — any FHIR system reads it
Fast internal queries (EHR rebuild) Yes — indexed column No — requires join
GDPR cascade delete WHERE document_id = X → delete Provenance chain traversal
Audit trail Event table logs Provenance.recorded timestamp

Why not just one

  • Column-only: fast but non-standard. When we export to a hospital's FHIR server, they can't read our custom columns.
  • Provenance-only: standards-compliant but expensive to query internally. Every EHR rebuild would require joining through Provenance resources.

Design principles

  1. Every FHIR resource traces to its source — document_id for OCR, message_id for chat, "manual" for REST API
  2. No orphan resources — deleting a document cascades to its derived FHIR resources (mark as entered_in_error, don't hard-delete)
  3. Dedup by source — re-processing the same document supersedes previous FHIR resources from that document, not duplicates
  4. Provider sees source — EHR export includes Provenance resources linking each finding to the source document name + date
  5. Chat data attributed — medications/allergies from conversation linked to message_id + timestamp

3. Scope

In scope

  • document_id + case_id columns on FHIRResource
  • Pass document_id through the document processing pipeline to the Clinical Context Agent
  • FHIR Provenance resources created alongside clinical resources
  • Source attribution in EHR snapshot (for frontend display)
  • Document deletion cascade (mark derived FHIR as entered_in_error)
  • Chat extraction source attribution (message_id, timestamp)
  • Dedup: re-processing same document supersedes old FHIR resources
  • Data forwarding audit table

Out of scope (future)

  • FHIR R4 DocumentReference resource for the raw PDF (proper FHIR document indexing — separate from our DocumentReference model)
  • Extraction confidence scores (needs LLM chain-of-thought parsing)
  • Provider-side FHIR ingestion API
  • Cross-border jurisdiction-specific consent (per-provider consent granularity)

4. Affected Surfaces

Backend

  • app/models/fhir_resource.py — 2 new columns
  • app/agents/clinical_context.py — state schema + store_resources
  • app/services/document_processing.py — pass document_id to agent
  • app/services/fhir_service.py — create schema + dedup logic
  • app/services/ehr_rebuild_service.py — source map in snapshot
  • app/services/chat_extractor.py — source attribution
  • app/routers/fhir.py — optional document_id in create API
  • app/models/consent.py — data categories + provider_id
  • New: app/models/data_forwarding_audit.py

Frontend

  • EHRPanel.tsx — show source document per finding
  • FullEHRDrawer.tsx — source tags on conditions/observations
  • Document list — link from document to its extracted findings

5. Risks

Risk Mitigation
Migration on fhir_resources table with existing data document_id nullable, server_default=None. Backfill optional.
Provenance resources double storage Provenance is small (~200 bytes each). At 100 resources per patient, adds ~20KB.
Performance of dedup query on re-process Index on (patient_id, document_id, resource_type). Single query.
Chat extraction message_id not available Pass message_id from case_orchestrator to chat_extractor. Small plumbing change.

6. References

  • FHIR R4 Provenance: https://hl7.org/fhir/R4/provenance.html
  • Session 35 audit: 6 gaps identified in document-to-FHIR traceability
  • Gap report finding #17: no FHIR confidence field
  • Porting spec (PR #85): meta.extension for case attribution