catalog¶

Date: 2026-05-22
Status: Accepted
Authors: SD, Claude
Related: ADR-0026 (matching framework — Postgres source, Neo4j projection), ADR-0027 (procedure-seeder YAML single source)
Implementation pointer: app/agents/ehr_builder_agent.py:138-220 (rungs 1-3 of the chain)

Context¶

The EHR builder synthesizes a structured primary_condition block (ICD-10 codes, name, snippet) from three independent sources before handing the case off to the matching engine. The matcher routes cases to provider specialties off primary_condition.icd10_codes — wrong ICDs send the patient to the wrong cohort of providers. Three PRs over the last three weeks edited this fallback chain without an ADR; this document captures the resulting precedence and the rationale for each rung so the next change has a written baseline to push against.

The Maria case 562b2c05-... was the live regression that surfaced the gap. Her ONCO-CHEMO procedure catalog row carried a placeholder ["C34.90", "C50.911"] (lung + breast cancer ICDs) that had no relationship to her actual documented diagnosis. The matcher routed her to oncology providers for the wrong cancer cohort. The actual diagnosis lived in fhir_resources as a Condition extracted from her uploaded records, but the EHR builder was reading the catalog default before consulting FHIR.

The three PRs that landed during the regression window:

#1026 — added etiology override (rung 1). Patient-stated etiology in the medical_status layer extractor can pick a more specific ICD than the procedure catalog default. Mary case fe80e675-... was the driver: AVN (avascular necrosis) etiology routes to AVN-specific ICDs instead of the catalog's generic OA default for hip-replacement procedures.
#1073 — added FHIR Conditions fallback (rung 2). Before this, the builder fell straight through to catalog when etiology was missing.
#1081 — cleared the bogus oncology catalog ICDs (Part A migration) AND reordered the chain so FHIR Conditions win over catalog regardless (Part B). This was the fix for the Maria regression.

The current order is therefore etiology > FHIR Conditions > catalog, where each subsequent rung is a strict fallback if the previous one yields no codes.

Decision¶

EHR ICD resolution uses a three-rung fallback chain, evaluated top-down. The first rung that yields a non-empty list of codes wins; subsequent rungs do not contribute. Filtering, deduplication, and the 2-code cap are applied at the FHIR rung (rung 2) — see "Filtering" below.

Rung 1 — Etiology override (#1026)¶

Source: case.metadata.layer_procedure.etiology — patient-stated cause from the medical_status layer extractor.

Why it wins: Patient-stated etiology is the most specific signal available. It is sourced directly from the patient's words for this case, not from a generic catalog default and not from a document that may describe a comorbidity rather than the case-relevant diagnosis. When present, etiology paired with the procedure name routes to the most clinically appropriate ICDs via lookup_etiology_icd(etiology, procedure_name, laterality_signal) in app/services/clinical_etiology_icd.py.

When it fires: Only when the medical_status layer extractor has populated layer_procedure.etiology. For procedures where etiology is structurally implied by the procedure itself (e.g., BMT for leukemia, chemo for the malignancy stated by the patient earlier in intake), the layer extractor may legitimately produce no etiology — that is the signal to fall through.

Rung 2 — FHIR Conditions on the patient (#1073, reordered to position 2 by #1081)¶

Source: fhir_resources filtered to resource_type == "Condition", ICD codes drawn from r.icd_codes.

Why it wins over catalog: FHIR Conditions are the patient's documented diagnoses extracted from uploaded medical records. They are case-specific, even when etiology is missing. For Maria, FHIR carried her actual diagnosis from the uploaded chart while the catalog carried a wrong placeholder — picking the case-specific signal is unambiguously correct.

When it fires: When the patient has uploaded records that produced one or more Conditions and rung 1 yielded nothing.

Filtering: fhir_resources is already pre-filtered to verification_state='confirmed' by FhirRepository.list_by_patient (default), so low-confidence rows do not leak in. The fallback caps at 2 codes, dedupes, and skips non-Condition resources to keep the matcher's input small and high-signal.

Rung 3 — Catalog `icd10_primary` (default)¶

Source: procedure_requirements.icd10_primary from the procedure catalog (config/procedures.yaml and downstream Procedure.icd10_primary).

Why it is last: The catalog default is generic — it is the typical ICDs for the procedure as a class, not for this patient. It is correct in the common case (orthopedic OA, leukemia BMT) and a regression risk in the uncommon case where the procedure catalog has imprecise placeholders (as oncology did before #1081 Part A).

When it fires: When neither etiology nor FHIR Conditions yield codes — e.g., a patient who has not uploaded records yet, has no etiology in the medical_status layer, and is at an early intake stage. Catalog still serves them correctly when the procedure-to-ICD mapping is tight (which is true for most non-oncology procedures today; oncology was the outlier and was cleaned up in #1081 Part A).

Consequences¶

What happens to cases without FHIR Conditions¶

Cases without FHIR Conditions are not regressed by this chain. Rung 3 (catalog) still serves them, exactly as it did before #1073. The two changes (#1073 and #1081 Part B) only re-rank the rungs when FHIR Conditions exist; cases that fall through to catalog are unaffected.

This matters because most patients at early intake have no FHIR Conditions yet — they have not uploaded records or the records are still being processed. The catalog rung is load-bearing for that cohort and should remain so until a stronger signal exists.

Maria-class regressions are now prevented¶

The combination of #1081 Part A (clear bogus catalog placeholders) and Part B (reorder so FHIR wins over catalog) means that a patient with documented diagnoses in FHIR will not be routed by a wrong catalog default. This is the durable fix; the catalog cleanup alone would have prevented Maria's case, but the reorder prevents the next class of regression where a catalog row drifts from the patient's actual diagnosis.

Future-proofing for additional rungs¶

If a future case requires a rung above etiology (e.g., a coordinator override stored as case.metadata.coordinator_icd_override) or below FHIR (e.g., a triage-classifier guess), the chain extends with the same pattern: each rung is a strict fallback, evaluated in priority order, with the most case-specific signal at the top. New rungs should not change the existing order without an ADR amendment.

A non-obvious migration risk: if a new rung is inserted between etiology and FHIR, every regression test that asserts "FHIR Conditions win" must be re-verified, since the assertion is now conditional on the new rung yielding nothing. The Maria replay test in tests/e2e/test_maria_562b2c05.py is the canonical guard — extend it when the chain changes.

Test coverage¶

The chain is exercised by: - tests/agents/test_ehr_builder_icd_chain.py — rung-by-rung unit tests (etiology-wins, FHIR-wins-over-catalog, catalog-fallback) - tests/e2e/test_maria_562b2c05.py — end-to-end Maria replay (PR #1083 / #1087 / #1088) - tests/agents/test_clinical_etiology_icd.py — rung 1 lookup table behavior

Why an ADR was overdue¶

Three PRs in three weeks each made a defensible local change, but the cumulative effect — a three-rung chain with non-obvious precedence — was nowhere documented as a single decision. The Maria audit on 2026-05-21 surfaced the gap. This ADR establishes the chain as a documented architectural decision so the next change pushes against a written baseline.