Skip to content

Issue #960 — Procedure Data Quality + Admin UI Audit

Date: 2026-05-19 Branch audited from: fix/proxy-review-remaining-procedures (HEAD 494026a) Auditor: automated read-only sweep Scope: state of #960 (MVP procedure coverage, 10 categories end-to-end) vs. prod data, in service of standing up E2E patient-workflow seed data.

Issue #960 was closed on GitHub but the underlying data is only partially complete (matcher + clinical depth wired for the 17 fabricated/Naidu-approved orthopedic/spine/rehab procedures; ICD/SNOMED + LOINC + per-procedure matching overrides + patient fixtures NOT seeded). This audit catalogues what is real, what is fabricated, and what is missing.


1. Issue #960 scope — checklist with status

Per the issue body the MVP comprises 10 categories (TKR, THR, bilateral knee, robotic knee, MAKO umbrella, spine bundle, arthroscopy bundle, trauma, rehab, shoulder bundle). Plan was structured into 9 phases.

# Phase / acceptance item Status Evidence
1 Phase 1 — add 7 missing procedure rows to config/procedures.yaml + Postgres + Neo4j DONE YAML has 32 procedures; PG procedure_requirements has 29 active rows; Neo4j has 45 :Procedure nodes (incl. all 10 MVP categories)
2 Phase 1 — provider tagging (PROVIDER_PROCEDURES) PARTIAL 184 provider_procedures rows; 132 are fabricated_pending_ops_2026_05_17. Concentrated on 10 fabricated hospitals (rows-per-procedure 4–14). 50 providers in PG total; Neo4j shows 1660 provider nodes (parallel universe, see §4).
3 Phase 2 — required documents, comorbidity screening, contraindications, cost, recovery, travel per procedure PARTIAL (Naidu-via-proxy, not Naidu-signed) 15 of 29 PG rows tagged fabricated_pending_ops_2026_05_17; YAML now has 18 entries tagged naidu_approved_claude_proxy_2026_05_1{8,9}. Genuine Naidu sign-off (issue #169) still pending.
4 Phase 2 — LOINC per-procedure test gates NOT STARTED Neo4j has 205 REQUIRES_TEST edges but they remain wired only to global tests (TKR shares the same battery as bypass). No procedure-scoped LOINC mapping in PG (procedure_capability_requirements ≠ test gating).
5 Phase 3 — doctor catalog linkage PARTIAL 143 doctors, 168 doctor_procedures rows. Covered codes match MVP set BUT volumes are fabricated. BMT-001 has 1 doctor; ONCO-SURG 2; 27487, 96413 are present but not in the YAML procedure catalog (orphans).
6 Phase 4 — capabilities vocabulary DONE-MINIMAL procedure_capability_requirements has 14 rows covering robotic_capable, trauma_level_1, spine_board_certified, arthroscopy_capable, shoulder_subspecialty, rehab_inpatient_capable. mako_certified capability from issue is missing (only robotic_capable exists).
7 Phase 5 — per-procedure matching params (procedure_overrides) NOT STARTED (no evidence found in config/matching/) Capabilities table covers gating but matcher param overrides not seeded.
8 Phase 6 — recovery + transport per procedure NOT STARTED Neo4j has only 7 HAS_RECOVERY edges (the original 3 procedures × 3 phases — basically TKR-only). No new procedure → recovery_phase mappings. Linked issues #958/#959 still open.
9 Phase 7 — synonyms + prompts PARTIAL YAML has keyword arrays (avg 7–20 entries) for the 18 Naidu-proxy rows. Prompt few-shot fixtures (tests/fixtures/v6_compliance/persona_{a,b,c}/) still only cover golden / treatment-recommendation / paraphrase paths — no procedure-specific shots for trauma/robotic/arthroscopy.
10 Phase 8 — frontend migration (Treatments.tsx → API-driven) + admin CRUD + storefront /procedures/:slug OUT OF SCOPE OF THIS AUDIT — not verified here
11 Phase 9 — E2E fixture corpus (1 per procedure, 10 transcripts) PARTIAL tests/fixtures/procedure_coverage/ has 15 JSON fixtures spanning all 10 MVP categories. But: no patient/case/conversation seed data in PG to execute them against (see §5). expected_scores.yaml exists alongside.
12 Acceptance — patient typing free-text reaches matching card with ≥1 provider, plus recovery + transport cards BLOCKED — no patient cases in PG (cases=0, conversations=0, messages=0, fhir_resources=0)

Overall verdict: the catalog + matching gating side of #960 is largely DONE for the MVP 10. The patient-workflow side (cases, conversations, FHIR, recovery edges, transport flags, per-procedure matcher params) is NOT seeded and is what currently blocks E2E.


2. Postgres inventory

All counts pulled 2026-05-19 from Railway service curaway against prod DATABASE_URL.

2.1 Top-level counts

Table Rows
procedure_requirements 29
procedure_capability_requirements 14
providers 50
provider_procedures 184
doctors 143
doctor_procedures 168
cases 0
conversations 0
messages 0
fhir_resources 0

2.2 procedure_requirements by metadata->>'data_source'

SELECT COALESCE(metadata->>'data_source', '<none>') AS ds, COUNT(*)
FROM procedure_requirements GROUP BY 1;
data_source rows
fabricated_pending_ops_2026_05_17 15
<none> (legacy seed) 14

Note: no rows in PG carry the naidu_approved_claude_proxy_2026_05_1{8,9} marker even though YAML does — the proxy work in YAML has not been re-seeded into PG. See §3 for drift detail.

2.3 procedure_requirements per-row depth (PG today)

SELECT procedure_code,
       jsonb_array_length(required_documents) AS rd,
       jsonb_array_length(comorbidity_screening) AS cs,
       jsonb_array_length(contraindications) AS ci,
       jsonb_array_length(keywords) AS kw
FROM procedure_requirements ORDER BY 1;
procedure_code rd cs ci kw cost rec travel data_source
22612 3 1 1 0 y y y none
22800 4 4 2 7 y y y fab
23420 2 4 2 7 y y y fab
23472 4 4 2 7 y y y fab
27130 3 0 1 0 y y y none
27130-ROBO 4 3 2 7 y y y fab
27245 2 4 2 8 y y y fab
27447 3 2 2 0 y y y none
27447-BIL 3 3 2 7 y y y fab
27447-ROBO 4 3 2 7 y y y fab
29805 2 3 1 6 y y y fab
29870 2 3 1 6 y y y fab
29882 2 3 2 7 y y y fab
29888 2 4 2 8 y y y fab
30400 3 0 2 0 y y y none
33405 2 0 1 0 y y y none
33533 2 1 1 0 y y y none
43775 5 2 2 0 y y y none
58970 4 2 1 0 y y y none
63005 2 3 2 7 y y y fab
63045 3 3 2 8 y y y fab
63047 3 3 2 8 y y y fab
66984 4 2 1 0 y y y none
BMT-001 9 4 5 0 y y y none
CARDIAC_BASE 8 4 3 0 n n n none
D6010 4 2 1 0 y y y none
ONCO_BASE 5 3 2 0 n n n none
ORTHO_BASE 7 4 3 0 n n n none
REHAB-POSTOP-001 5 4 2 8 y y y fab

Observations: - 0 rows have neo4j_slug populated in PG (column is null for all 29 rows) — PG ↔ Neo4j join is name-based, fragile. - Three *_BASE template rows (CARDIAC_BASE, ONCO_BASE, ORTHO_BASE) have no cost / recovery / travel — these appear to be inheritance roots and may pollute the procedure list. - No keywords data for 14 legacy rows in PG — synonym fuzzy match will fail for everything outside the fabricated_pending_ops_2026_05_17 set. - 92920, ONCO-SURG, ONCO-CHEMO, ONCO-RAD, 67228, 67036 exist in YAML but not in PG — see §3.

2.4 providers breakdown

SELECT provider_type, country_code, is_active, COUNT(*)
FROM providers GROUP BY 1,2,3 ORDER BY 1,2;
provider_type country active n
hospital ARE y 5
hospital CRI y 4
hospital ESP y 5
hospital IND y 7
hospital KOR y 5
hospital MEX y 5
hospital THA y 5
hospital TUR y 6
transport_ground ESP/IND/THA/TUR y 5 (total)
transport_medical IND/THA/TUR y 3

Total: 50 providers (42 hospitals + 8 transport). No metadata column on providers, so data_source tagging is not available at the provider row level — it lives on provider_procedures.metadata.

2.5 provider_procedures by procedure (top 20)

procedure_code n providers offering
27447 14
REHAB-POSTOP-001 10
29805 10
29882 10
29870 10
27245 10
29888 10
27447-BIL 10
23420 10
23472 10
63047 / 63005 / 27130-ROBO / 27447-ROBO / 63045 / 22800 7 each
33533 6
22612 / 27130 5
58970 4
47135 (Liver Transplant — orphan; in PG only, not in YAML, not in procedure_requirements) 3
43775 / 33405 / 66984 / 30400 3 each
D6010 2
15830 (orphan, not in YAML, not in procedure_requirements) 1

data_source split: 132 fabricated, 52 legacy/none.

2.6 doctor_procedures top 20

procedure_code doctors
27447 16
96413 (chemo admin CPT — orphan) 15
33533 13
58970 13
30400 11
D6010 10
43775 8
23420 / 27130 6
22612 / 29888 / 63047 5
22800 / 23472 / 27130-ROBO / 27245 / 27447-BIL / 27447-ROBO / 29805 / 29882 / REHAB-POSTOP-001 4
29870 / 33405 / 63005 / 63045 / 66984 3
ONCO-SURG 2
BMT-001 / 27487 (orphan) 1

2.7 procedure_capability_requirements — full dump

SELECT procedure_code, capability_code, weight, is_blocking FROM procedure_capability_requirements;
procedure_code capability_code weight blocking
27447-ROBO robotic_capable 1.000 yes
27130-ROBO robotic_capable 1.000 yes
27245 trauma_level_1 1.000 yes
22800 spine_board_certified 1.000 yes
63047 spine_board_certified 1.000 yes
63045 spine_board_certified 1.000 yes
63005 spine_board_certified 1.000 yes
29888 arthroscopy_capable 0.600 no
29882 arthroscopy_capable 0.600 no
29870 arthroscopy_capable 0.600 no
29805 arthroscopy_capable 0.600 no
23472 shoulder_subspecialty 0.750 no
23420 shoulder_subspecialty 0.750 no
REHAB-POSTOP-001 rehab_inpatient_capable 1.000 yes

Missing capabilities vs. #960 plan: - mako_certified — not present (issue called this out as the canonical tag for MAKO platform). - mri_intraoperative, level_1_trauma_24_7_or, inpatient_rehab_beds_count — not present. - 27447-BIL has no capability requirement despite being a distinct, more complex procedure.


3. YAML vs. DB drift

Source: config/procedures.yaml (32 procedures) vs. procedure_requirements (29 rows).

3.1 In YAML but missing from PG procedure_requirements

YAML procedure_code YAML data_source Impact
92920 (angioplasty) naidu_approved_claude_proxy_2026_05_18 API/storefront will not surface it via PG
67228 (laser photocoagulation) naidu_approved_claude_proxy_2026_05_18 same
67036 (vitrectomy) naidu_approved_claude_proxy_2026_05_18 same
ONCO-SURG naidu_approved_claude_proxy_2026_05_18 doctor_procedures references it (2 doctors) so partial integrity
ONCO-CHEMO naidu_approved_claude_proxy_2026_05_18 not in PG anywhere
ONCO-RAD naidu_approved_claude_proxy_2026_05_18 not in PG anywhere

3.2 In PG but missing from YAML

PG procedure_code Likely root cause
CARDIAC_BASE Template/inheritance root — should be is_active=false or removed
ONCO_BASE Same
ORTHO_BASE Same
47135 (liver transplant) Referenced by 3 provider_procedures rows but absent from YAML — orphan
15830 Single provider_procedures row — orphan
27487 (revision knee) 1 doctor_procedure row — orphan
96413 (chemo admin CPT) 15 doctor_procedures rows but no procedure row — orphan code

3.3 Naidu-proxy depth drift (YAML far richer than PG)

For 14 procedures the YAML carries the naidu_approved_claude_proxy_2026_05_19 payload but PG was last seeded under the older fabricated_pending_ops_2026_05_17 profile. Per-field counts (YAML → PG):

code required_documents comorbidity_screening contraindications keywords
27447-BIL 14 → 3 3 → 3 2 → 2 12 → 7
27447-ROBO 12 → 4 3 → 3 2 → 2 12 → 7
27130-ROBO 13 → 4 3 → 3 2 → 2 12 → 7
22800 15 → 4 4 → 4 2 → 2 12 → 7
63047 12 → 3 3 → 3 2 → 2 12 → 8
63045 12 → 3 3 → 3 2 → 2 12 → 8
63005 10 → 2 3 → 3 2 → 2 10 → 7
29888 9 → 2 4 → 4 2 → 2 12 → 8
29882 9 → 2 3 → 3 2 → 2 12 → 7
29870 8 → 2 3 → 3 1 → 1 10 → 6
29805 8 → 2 3 → 3 1 → 1 10 → 6
27245 11 → 2 4 → 4 2 → 2 20 → 8
23472 13 → 4 4 → 4 2 → 2 11 → 7
23420 9 → 2 4 → 4 2 → 2 11 → 7
REHAB-POSTOP-001 7 → 5 4 → 4 2 → 2 16 → 8

The required_documents drift is the most patient-visible regression — Naidu-proxy work in YAML doubles or triples the document checklist, but the current API responses are still serving 2–5 documents per procedure. Re-running seed_procedures.py would close this gap.

3.4 neo4j_slug drift

  • YAML: every Naidu-proxy procedure carries a neo4j_slug (e.g. acl_repair, scoliosis_correction).
  • PG: procedure_requirements.neo4j_slug is NULL for all 29 rows.
  • Neo4j: nodes have code (e.g. knee_replacement, acl_repair) and name, no slug.

Cross-store joins are happening on name (or a hardcoded slug lookup somewhere in code). Reseeding from YAML would populate PG neo4j_slug correctly.


4. Neo4j inventory

Connected via Bolt URI from railway run. Database GUID 21eea64f.

Entity / edge Count
:Procedure nodes 45
:Provider nodes 1660 (vs 50 in Postgres — different population)
:DiagnosticTest (via REQUIRES_TEST) 205 edges
:RecoveryPhase (via HAS_RECOVERY) 7 edges total (TKR/THR/Spinal only)
:Accreditation (HAS_ACCREDITATION) 87
:Condition (HAS_CONDITION) 1
OFFERS edges 10,077
REQUIRES edges 23
REQUIRES_FOR_PROCEDURE 12
PERFORMS 174
AFFILIATED_WITH 91
SPECIALIZES_IN 361
SPEAKS 323
BELONGS_TO 10
TREATED_BY 13
LOCATED_IN 1660

4.1 Procedure node property keys

['anesthesia_type', 'category', 'code', 'complexity', 'max_stay_days',
 'min_stay_days', 'name', 'specialty', 'travel_clearance_days',
 'typical_duration_hours']

No procedure_code field (the issue's CPT code) — Neo4j uses its own code like knee_replacement, acl_repair, scoliosis_correction. The YAML neo4j_slug field aligns with these. There is no foreign key on either side enforcing the join — drift is invisible until lookup time.

4.2 OFFERS edges per procedure (top 25)

Procedure Providers offering
Total Hip Replacement 1651
Spinal Fusion 1651
Total Knee Replacement 1651
Coronary Angioplasty 1616
Coronary Artery Bypass Graft 1616
Heart Valve Replacement 1616
Surgical Oncology 26
Radiation Therapy 25
Chemotherapy 25
Rhinoplasty 20
Dental Implants Full Mouth 18
IVF Cycle 16
Shoulder Arthroscopy / ACL Repair / Knee Arthroscopy / Fracture Fixation / Meniscus Repair / Rotator Cuff Repair / Bilateral Knee Replacement / Shoulder Replacement / Post-Surgical Rehabilitation 10 each
Laminectomy / Cervical Decompression / Lumbar Decompression / Scoliosis Correction Surgery 7 each

This is the source of the "1660 providers" puzzle: Neo4j carries a far-larger fabricated provider universe than PG. PG is the source of truth for the matcher / API; Neo4j is being used inconsistently. The matcher's behaviour for the MVP-10 needs to be tested against the PG numbers (7–14 providers per procedure), not the Neo4j 1660s.

4.3 HAS_RECOVERY coverage

MATCH ()-[r:HAS_RECOVERY]->() RETURN count(r);  // = 7

Sample (TKR procedure node): Inpatient Recovery, Local Outpatient Rehabilitation, Home Country Follow-up — that's the original 3 phases × 3 procedures totalling ~7 edges. None of the 7 new MVP procedures (bilateral knee, robotic, spine, arthroscopy, trauma, shoulder, rehab) have recovery-phase mappings. This is the Phase 6 blocker that #960 + #958 share.


5. Patient-workflow seed gaps

End-to-end test of intake → matching → care plan → recovery requires the following entities to exist as realistic seed data. Current state:

Entity Needed for Prod PG today Fixture corpus Verdict
users (patient personas) Intake start not queried in this audit but cases=0 implies no usable personas n/a MISSING
cases (patient journey root) All downstream steps 0 rows tests/fixtures/v6_compliance/persona_{a,b,c}/*.json are conversation transcripts only MISSING
conversations Intake transcript 0 rows persona JSONs replay-able into the API but no DB rows seeded MISSING
messages Per-turn messages 0 rows same MISSING
fhir_resources (Patient, DocumentReference, Observation, Condition) Doc upload + extraction 0 rows none MISSING
Procedure interest (case → procedure_code link) Matching trigger n/a — no cases implicit in fixtures MISSING
Uploaded medical documents (object storage refs + extracted FHIR) Doc upload step n/a none MISSING
Match result rows Matching output n/a tests/fixtures/procedure_coverage/*.json carry expected match assertions but no seeded match-engine output MISSING
Consent record Care plan gate n/a none MISSING
Provider universe to match against Matching 50 providers in PG, OK n/a OK
Procedure catalog Matching 29 rows in PG, OK for MVP-10 n/a OK (with caveats from §3.3)
Recovery phase mappings Recovery card only 3 procedures have edges in Neo4j; 0 in PG recovery_needs table for the 7 new MVP procedures n/a PARTIAL — blocks recovery cards for 7 of 10 MVP procedures
Transport vendor mappings per procedure Transport card not audited in detail; config/seed_transport_vendors.yaml exists n/a UNKNOWN — likely partial; #959 still open

5.1 Existing fixture corpus (handy for replay)

tests/fixtures/v6_compliance/persona_a/  (golden + demographic_fabrication + doc_trust_fail + emotional_paraphrase + treatment_recommendation)
tests/fixtures/v6_compliance/persona_b/  (golden + late_records_offer + scope_rejection + two_questions_same_axis + voice_violation)
tests/fixtures/v6_compliance/persona_c/  (abrupt_same_cluster + golden_discovery + json_parse_failure + premature_pivot + soft_axis9_two)
tests/fixtures/procedure_coverage/       (acl_repair, bilateral_knee_replacement, cervical_decompression, fracture_fixation, knee_arthroscopy, laminectomy, lumbar_decompression, meniscus_repair, post_surgical_rehab, robotic_hip_replacement, robotic_knee_replacement, rotator_cuff_repair, scoliosis_correction, shoulder_arthroscopy, shoulder_replacement)

These are JSON conversation transcripts with expected_scores.yaml — designed for the v6 LLM grader, not for E2E API replay. They cover 15 of the 32 procedures but bypass the DB writes entirely.


P0 — blocking E2E patient workflow tests

  1. Seed a patient-case fixture set into PG. Need at least 5 personas × {case, ≥1 conversation, ≥3 messages, 1-2 fhir_resources (DocumentReference + Observation), 1 procedure interest, expected match outcome}. Add a seeder script app/seeds/seed_patient_journeys.py or extend seed_procedures.py. Tag with data_source: 'fabricated_e2e_test_2026_05_19' so it can be filtered out of prod views and dropped cleanly.
  2. Re-run seed_procedures.py against prod to close the YAML → PG drift on required_documents, keywords, and neo4j_slug for the 14 Naidu-proxy procedures (§3.3). Doubles the document checklist patients see.
  3. Insert the 6 missing PG rows for 92920, 67228, 67036, ONCO-SURG, ONCO-CHEMO, ONCO-RAD (§3.1). YAML payloads exist; one seed run should fix.
  4. Seed HAS_RECOVERY edges in Neo4j for the 7 new MVP procedures (bilateral knee, both robotic, the spine bundle, fracture fixation, shoulder bundle, REHAB-POSTOP-001). Without this the recovery card returns empty for 7 of 10 MVP procedures.
  5. Inactivate or delete *_BASE rows in procedure_requirements (CARDIAC_BASE, ONCO_BASE, ORTHO_BASE) — they leak into /api/v1/procedures and confuse the storefront list.

P1 — should-have before declaring #960 truly closed

  1. Add mako_certified capability + tag MAKO-hosting providers (issue called this out explicitly).
  2. Decide on the orphan codes 47135 (liver transplant), 15830, 27487, 96413 — either add YAML rows or remove from provider_procedures / doctor_procedures.
  3. Populate procedure_requirements.neo4j_slug from YAML so cross-store joins stop being name-based.
  4. Wire per-procedure matching overrides (config/matching/parameters/*.yaml procedure_overrides block from Phase 5 of #960). Without it the matcher still uses 147 global params for trauma vs. cataract.
  5. Add a per-procedure LOINC test-gating layer in procedure_capability_requirements (or a new procedure_test_requirements table) — TKR currently shares its lab panel with cardiac.
  6. Reconcile Neo4j Provider universe (1660 nodes) with PG (50). Either trim Neo4j to the real 50 or document why the larger fabricated graph exists.

P2 — polish

  1. Add a data_source column directly to procedure_requirements and providers (currently has to be probed via metadata->>'data_source') — gives ops a queryable filter.
  2. Backfill keywords for the 14 legacy procedures (rd column shows kw=0 for 22612, 27130, 27447, 30400, 33405, 33533, 43775, 58970, 66984, BMT-001, CARDIAC_BASE, D6010, ONCO_BASE, ORTHO_BASE).
  3. Naidu real sign-off (issue #169) — replaces all naidu_approved_claude_proxy_* markers with naidu_approved_2026_*.
  4. Decommission orphan CPT codes 96413, 27487, 15830 from doctor_procedures / provider_procedures after deciding the canonical catalog.

Appendix A — queries used

# Run via: railway run --service curaway -- python3 -c "<below>"
import asyncpg, asyncio, os, json
async def main():
    url = os.environ['DATABASE_URL'].replace('postgresql+asyncpg://', 'postgresql://')
    conn = await asyncpg.connect(url)
    # See queries inline in section bodies above.
asyncio.run(main())

Neo4j queried via neo4j Python driver using NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD from the Railway env.

YAML compared via:

import yaml
with open('config/procedures.yaml') as f:
    procs = yaml.safe_load(f)
# enumerate procedure_code, required_documents, ... and diff against PG depth output