Skip to content

ADR-0027 — Procedure Seeder: YAML as Single Source of Truth

Date: 2026-05-18 Status: Accepted Deciders: Engineering (AI-assisted), Dr. Shrikanth Naidu (clinical content review) Related issues: #960, #279, #985 (Neo4j projection), PR #990, PR #994


Context

config/procedures.yaml was declared the single source of truth in its own header comment, but app/seeds/seed_procedures.py::PROCEDURES was separately hand-curated with rich clinical payloads (required documents, comorbidity screening, contraindications, cost ranges, recovery timelines, travel considerations). This created drift: 32 yaml entries vs 20 Python dict rows, and 6 yaml-only procedures with zero procedure_requirements row in the database.

The backfill script (scripts/backfill_procedure_documents.py) imported PROCEDURES directly, so it also missed the 6 yaml-only codes.


Decision

Option A — yaml is the single authoring surface; seed_procedures.py::PROCEDURES is a derived loader.

The yaml schema is extended with seven new optional fields per entry: parent_procedure_code, category, snomed_primary, required_documents, required_tests, comorbidity_screening, contraindications, cost_range, recovery_timeline, travel_considerations.

seed_procedures.py adds _load_procedures_yaml() and rebuilds:

PROCEDURES = _BASE_TEMPLATES + [_yaml_to_procedure_dict(p) for p in _load_procedures_yaml()]

The base templates (ORTHO_BASE, CARDIAC_BASE, ONCO_BASE) stay hand-coded because they are not catalog procedures — they are inheritance parents used only by the seeder.


Inheritance algorithm

Shallow merge with list-concatenation for INHERITABLE_LIST_FIELDS:

INHERITABLE_LIST_FIELDS = {"required_documents", "required_tests", "comorbidity_screening", "contraindications"}
  • For scalar fields (category, cost_range, etc.), the child value wins; parent fills if absent.
  • For list fields in INHERITABLE_LIST_FIELDS, parent list is prepended and child list appended (parent provides defaults; child adds procedure-specific items).
  • A child entry with an empty list (required_documents: []) explicitly overrides the parent to empty — empty list is NOT treated as "absent". Use null / omit the key to inherit.

Marker grammar

^(fabricated|naidu_approved)_<author_tag>_<YYYY>_<MM>_<DD>$

Examples: - fabricated_pending_ops_2026_05_17 (existing Phase-2 entries) - fabricated_pending_ops_2026_05_19 (new strawman entries from this PR) - naidu_approved_clinical_sweep_2026_06_01 (post-#169 sign-off)

The assert_marker_valid() helper in app/seeds/_base.py enforces this pattern. CI test tests/seeds/test_marker_grammar.py walks every yaml and every metadata->>data_source value in a freshly-seeded test DB.


Generalised seeder architecture (Section 9)

This ADR commits to the pattern established for all future entities:

config/<entity_plural>/seed.yaml        ← single authoring surface
app/seeds/_validators.py                ← Pydantic validation (fail-loud at import)
app/seeds/_inheritance.py               ← shallow-merge + list-concat resolver
app/seeds/_base.py                      ← SeederBase: upsert, dry-run, diff, marker validation
app/seeds/<entity>_seeder.py            ← idempotent Postgres upsert
app/seeds/_runner.py                    ← DAG-walking master runner

The migration roadmap for all other entities is in Section 10 of docs/superpowers/plans/2026-05-18-required-documents-gap-fill.md.


Trade-offs

Pro Con
yaml is already the ops/clinical review surface yaml carries ~3000 lines of clinical detail (previously readable Python dicts)
Naidu reviews one file, not two YAML strings lack IDE type-checking on field names
Loader validates via Pydantic at import — bad authoring fails loud Import-time validation adds ~50ms to seeder startup
New fields added to yaml without code change Base templates remain Python — one more place to update
PROCEDURES symbol unchanged → existing tests pass

Consequences

  1. config/procedures.yaml grows to ~600 lines with clinical payload. This is expected and acceptable — yaml is designed for this.
  2. app/seeds/seed_procedures.py shrinks by ~2000 lines as Python dicts are removed.
  3. All 6 yaml-only procedures (67036, 67228, 92920, ONCO-CHEMO, ONCO-RAD, ONCO-SURG) now have required_documents authored and seeded.
  4. OPHTHALMOLOGY_BASE is added as a new base template in the Python list.
  5. scripts/backfill_procedure_documents.py is superseded by scripts/backfill_procedure_clinical_payload.py which sources from the yaml loader.
  6. The old app/seed_*.py modules at the app root become deprecation shims.