Skip to content

Document Scoping — Current State and Future Plan

Current State (MVP)

Documents are case-scoped by a timestamp filter — uploaded_at >= case.created_at.

This is enforced in three places that ALL query document_references:

Query File Filter
Document checklist app/routers/cases.py:945 patient_id + tenant_id + uploaded_at >= case.created_at
EHR rebuild app/services/ehr_rebuild_service.py:85 patient_id + tenant_id + analysis_status='completed' + uploaded_at >= case.created_at
Patient state coverage app/services/patient_state.py:166 patient_id + uploaded_at >= case.created_at

Why timestamp filter (proxy)

The document_references table does not have a case_id column. Adding one requires:

  1. Schema migration to add case_id (nullable to start)
  2. Backfill existing rows by inferring from uploaded_at → matching case
  3. Update presign + confirm endpoints to accept and store case_id
  4. Update frontend upload flow to pass case_id
  5. Migrate all 3 query sites from timestamp filter to case_id filter
  6. Drop the timestamp filter and make case_id NOT NULL

For the MVP demo, the timestamp proxy is reliable enough — the only edge case is two cases created within the same second, which doesn't happen in real demo flow.

Why case-scoped (not patient-scoped)

We want EHR + checklist + patient state to all show the SAME picture for a given case. Patient-scoped queries caused this bug:

User created a new case → progress rail showed "4 of 15 mandatory" docs already uploaded (because previous cases for the same patient had docs). But the EHR was empty (because it's snapshot-stored on the case row). Mismatch confused users and looked broken.

Case-scoping makes the inconsistency impossible: a fresh case has 0 documents, 0 conditions extracted from docs, and 0 lab observations. Clean slate.

Cross-Case Document Reuse — Existing Mechanism

Patients shouldn't have to re-upload their X-ray for every new case. The _check_existing_records() function in case_orchestrator.py already handles this:

reusable = await _check_existing_records(db, patient_id, tenant_id, case)
if reusable:
    # Show patient: "I found previous records that are still valid:
    # SD_20250209.pdf, knee_xray.pdf. Use these for this case?"

This fires on the records-first turn. The user sees previous records and gets to opt in. Explicit consent, not silent inheritance.

Shipped: case_id Column Migration (Session 39, PR #152)

Phases 1–2 shipped in Session 39 via Alembic migration a2b3c4d5e6f7. 148 existing document_references rows were backfilled using the timestamp-inference logic described below. The column is nullable while Phase 3 (upload flow refactor) is completed. Phases 3–5 remain as planned below.


Future Plan — Proper case_id Column (Phases 3–5 remaining)

Phase 1: Schema migration (post-MVP)Shipped (Session 39, PR #152)

Migration a2b3c4d5e6f7 ran in production. 148 documents backfilled.

ALTER TABLE document_references
ADD COLUMN case_id VARCHAR(36) REFERENCES cases(id);

CREATE INDEX idx_document_references_case_id
ON document_references(case_id);

Make case_id nullable initially so existing rows don't break.

Phase 2: Backfill

For each existing document_references row:

  1. Find the case where case.patient_id = doc.patient_id AND case.created_at <= doc.uploaded_at AND (next case for that patient does not exist OR doc.uploaded_at < next_case.created_at)
  2. Set doc.case_id = matched_case.id

Rows that can't be matched (e.g., orphaned uploads) get case_id = NULL and are excluded from queries.

Phase 3: Update upload flow

  • Frontend uploadFileToR2() accepts caseId parameter
  • Backend presign endpoint accepts case_id in request body
  • Backend confirm endpoint stores it on the document record
  • All 3 query sites switch from uploaded_at >= case_created to case_id = :case_id

Phase 4: Make case_id NOT NULL

Once backfill is complete and all upload paths set case_id, change the column to NOT NULL and drop the timestamp filter from queries.

Phase 5: Cross-case reuse via explicit linking

Add a join table for explicit cross-case document reuse:

CREATE TABLE case_document_links (
    id VARCHAR(36) PRIMARY KEY,
    case_id VARCHAR(36) REFERENCES cases(id),
    document_id VARCHAR(36) REFERENCES document_references(id),
    linked_at TIMESTAMP NOT NULL,
    linked_by VARCHAR(36),  -- patient or admin
    UNIQUE(case_id, document_id)
);

When the patient clicks "use this previous record for the new case", we INSERT a row in case_document_links instead of duplicating the document. Queries union the documents owned by the case (via case_id) and the documents linked to the case (via case_document_links).

Migration Risks

  • Backfill correctness: the timestamp inference may be ambiguous if a patient has overlapping cases (rare but possible). Manual review needed.
  • Frontend coordination: every upload path needs to pass case_id. Missing one means orphaned documents.
  • Query consistency: all 3 sites must migrate at the same time. Mixing case_id and timestamp filters leads to the same bug we just fixed.

Acceptance Criteria

Phase 1-4 are complete when:

  • pytest tests/test_document_scoping.py (new test file) all pass
  • A new case shows 0 documents in checklist + 0 in EHR + 0 in patient state
  • A patient with 5 documents from 3 cases shows the right docs per case
  • The cross-case reuse flow (_check_existing_records) still works
  • E2E test e2e/conversation-regression.spec.ts::EHR data flow passes

Status

Phase Status
MVP timestamp filter (3 query sites) Complete
Phase 1: case_id column migration Complete — migration a2b3c4d5e6f7, Session 39 PR #152
Phase 2: Backfill Complete — 148 documents backfilled
Phase 3: Upload flow refactor Not started
Phase 4: NOT NULL + drop timestamp filter Not started
Phase 5: Cross-case linking table Not started