ADR-0013: Rule-Based Pre-Operative Risk Assessor over LLM Agent¶

Status: Accepted Date: 2026-04-07 Session: 33

Context¶

Cross-border surgical patients arrive with varied clinical risk profiles — elderly patients on anticoagulation, diabetics with elevated HbA1c, patients with severe anemia or impaired renal function. The matching engine and the conversational agent both need a structured view of these risks so the system can:

Surface them in the EHR drawer with severity + source provenance
Pause forwarding to providers when a risk is blocking (e.g., warfarin without a bridging plan, HbA1c ≥9%, INR >1.5)
Show patients exactly why their case requires resolution before the case advances

Section 11.2.5 of CLAUDE.md proposes a fifth agent — the Risk Agent — modelled after the existing Clinical Context, Intake, Match, and Explanation agents. This would be a LangGraph node calling Claude Sonnet or Haiku with a few-shot prompt that takes the EHR snapshot as input and returns a list of clinical risks.

That approach was deferred at the time on cost grounds, with the note "Clinical Context Agent captures essential risk factors as part of FHIR extraction". In practice the Clinical Context Agent does not surface structured risk factors — it extracts conditions, observations, and medications, but never assigns severity or is_blocking semantics to them. Today's EHR drawer has a risk_factors array in its data contract with no producer.

Decision¶

Build a rule-based app/services/risk_assessor.py as a pure function that takes a built EHR snapshot and returns a structured list of risks. Each risk has the shape:

{
    "factor": str,
    "severity": "low" | "moderate" | "high",
    "relevance": str,
    "mitigation": str | None,
    "source": {"type": "comorbidity"|"medication"|"lab"|"demographic", "ref": str},
    "is_blocking": bool,
}

The risk assessor runs at the end of ehr_builder_agent.run_ehr_builder on every EHR rebuild, deterministic, no LLM call. Wired into ehr_snapshot.risk_factors. The Risk Agent originally proposed in Section 11.2.5 is superseded by this service for the POC and near-term production use.

The rule set covers four buckets:

Age — ≥70 moderate, ≥80 high
Comorbidities — diabetes (type 1/2/pre), hypertension, atrial fibrillation, heart failure, CAD, CKD, COPD, asthma, OSA, anemia, obesity, thyroid, liver disease
Medications — anticoagulants (blocking), antiplatelets, immunosuppressants, NSAIDs, diabetes meds (infers diabetes when not already flagged)
Labs — HbA1c (≥9% blocking, ≥8% moderate), Hgb (<8 blocking, <11 moderate), eGFR (<30 high, <60 moderate), INR (>1.5 blocking), platelets (<100k blocking)

Each rule's mitigation note is hand-written and references the standard pre-op protocol for that condition.

Rationale¶

Determinism matters for blocking decisions. When is_blocking=True pauses forwarding to providers, the patient needs to understand exactly why. A rule with a clear source.ref ("Warfarin → high bleeding risk → bridging plan required") is more defensible — and more auditable for clinical review — than "the AI decided this was risky". LLM stochasticity is the wrong tool for the part of the system that gates clinical decisions.
Source provenance is harder with an LLM. The rule-based version points at the exact medication row, comorbidity, or lab observation that triggered each risk. Achieving the same with an LLM requires a structured-output prompt with explicit citation fields, which is doable but adds complexity for a workflow where ~80% of the real risk surface is well covered by rules.
Cost. ~$0 vs ~$0.01–$0.05 per case. At POC scale (~100 cases/mo) the cost difference is small, but at the matching-trigger scale (every EHR rebuild = potentially 5–10 evaluations per case journey), running an LLM call for each rebuild quickly outweighs the marginal accuracy gain.
Latency. <5ms vs ~1-2s. The risk assessor runs synchronously at the end of run_ehr_builder and feeds the matching gate. Adding an LLM call here would push EHR rebuild p99 above the SSE timeout budget.
Mirrors an existing pattern. app/agents/lab_analyzer.py (Session 21) is already rule-based for the same reason — it detects comorbidities from lab values via deterministic thresholds. The comorbidity_llm_shadow flag exists for future LLM upgrade. Risk assessor will follow the same pattern: ship rule-based, add an LLM shadow comparison post-Series A when we have ≥1000 graded cases.

Alternatives considered¶

Build the Risk Agent as originally proposed. Adds an LLM call to every EHR rebuild, hurts latency and cost, and is harder to audit. Rejected for the reasons above. Reconsider post-Series A.
Embed risk extraction inside the Clinical Context Agent. Would conflate clinical entity extraction with risk assignment in the same prompt. Empirically the Clinical Context Agent already misses structured fields when overloaded — adding risk reasoning would make this worse. Same issue as why the chat extractor exists as a separate call.
Skip risk assessment entirely for the POC. Tempting, but the EHR drawer's RiskAssessmentSection already exists in the frontend contract, the matching pipeline benefits from the is_blocking signal, and clinical advisors expect to see this for any patient-facing health platform. Not viable.

Consequences¶

The Risk Agent row in CLAUDE.md Section 11.2.5 is superseded by this service. The original "Deferred to Post-POC" note is updated to reference this ADR.
Adding a new rule = ~5 lines in the appropriate bucket of risk_assessor.py + a unit test. No prompt engineering, no evaluation set required.
Rules drift from real-world clinical guidance over time. We must schedule a clinical advisor review of the rule set annually (next review: 2027-Q1, owner: Dr. Shrikanth Naidu).
Edge cases we don't catch (drug-drug interactions, novel comorbidity combinations, rare procedure-specific risks) are visible in two ways: (a) the Match Agent re-ranking step on top-5 providers and (b) future LLM shadow mode (see "Reconsidering" below).
The risk assessor must remain pure-function and side-effect-free so it can be re-run on demand without DB writes (e.g., during unit tests, during EHR drawer rendering).

Reconsidering¶

Build the LLM-powered Risk Agent when any of the following are true:

We have ≥1000 real patient cases and a clinical advisor has graded rule-based vs LLM-based risk lists side-by-side
A clinical advisor flags a rule miss that the rule set fundamentally cannot catch (e.g., drug interactions outside the lookup tables)
We start handling more complex procedure categories (transplants, oncology, multi-system disease) where the rule space explodes
Post-Series A budget allows ~$50/mo additional LLM cost for clinical reasoning

When that happens, the upgrade path is shadow mode: run a risk_assessor_llm_shadow (mirroring comorbidity_llm_shadow in lab_analyzer.py:110) that calls Sonnet on every EHR rebuild, logs its risk list to the events table alongside the rule-based output, and lets us compare for ~500 cases before flipping the authoritative source.