Skip to content

2026-04-07 — Risk assessor, conversation flow gates v2, full EHR drawer, voice fixes

Author: Srikanth Donthi (CPO/CTO) + Claude Code session Theme: Patient-facing conversation polish, EHR rich UI, two prod outages caught and fixed, three-layer plan to fix the broken conversation flow with Layer 1 shipped and Layers 2/3 documented for later

Headline numbers

  • 15 PRs merged (8 backend, 5 frontend, 2 docs)
  • ~50 unit tests added (24 gates_v2 + 26 risk_assessor)
  • 8 new Vitest + 5 new Playwright tests on the frontend
  • 3 conversation-flow remediation layers documented (1 shipped, 2 parked with full specs ready to execute)
  • 2 prod outages caught and fixed within a single session (SSE tenant routing, voice prompt template leak)

Shipped to production

Backend (curaway-ai/curaway-backend)

PR Title What it does
#64 feat(sse): require Clerk JWT on SSE endpoints verify_sse_token + claims_to_tenant_id helpers; documents/stream, cases chat/stream, cases messages/stream all gated; on-site tests notice rich card on the forwarding step
#65 fix(voice): stop agent assuming "abroad" + clinical labeling Removed leaky [symptom] bad enough to need surgery template; added NEVER ASSUME "ABROAD" rule; 4 new forbidden phrases in voice_rules.yaml
#66 hotfix(sse): query tenant_id wins over Clerk org_id Prod outage fix — INTAKE_PATIENT_NOT_FOUND_001. Inverted precedence in claims_to_tenant_id so caller fallback wins; Clerk org_id only used when no caller value supplied. Patient rows are partitioned under tenant-apollo-001, not under Clerk org IDs (yet)
#67 fix(agent): records-first uses real Neo4j tests + validity windows Wired case_service.get_procedure_requirements() into the records-first LLM call; new _format_required_records_block helper; strengthened records_first phase prompt to forbid invented validity periods
#68 feat(ehr): rule-based pre-operative risk assessor New app/services/risk_assessor.py with rules for age, comorbidities, anticoagulants, immunosuppressants, NSAIDs, HbA1c, Hgb, eGFR, INR, platelets. is_blocking flag for risks that should pause forwarding. Wired into ehr_builder_agent. 26 unit tests.
#69 docs: conversation flow gates v2 — steer + feature spec Layer 1 design + implementation spec
#70 fix(orchestrator): conversation flow gates v2 (Layer 1) New _intake_complete_v2 5-condition gate, is_explicit_advance_phrase helper, lowered completeness_for_matching 0.5 → 0.4, allow_when_intake_complete, gates_v2 flag (default on), 24 unit tests, 3 new gates_v2.* events for observability
#71 docs: conversation flow layers 2 + 3 — steer + feature specs Orchestrator planner refactor + synchronous chat extractor — design + spec only

Frontend (curaway-ai/curaway-frontend)

PR Title What it does
#13 fix(sse): pass Clerk JWT on SSE URLs + render on-site tests notice ?token=<clerk-jwt> on all 3 EventSource URLs in ConversationApp.tsx + UploadRecords.tsx; confirmation rich card renders on_site_tests_notice sub-card; new e2e/sse-auth.spec.ts
#14 feat(ehr): full EHR view drawer New FullEHRDrawer (1184 LOC, single cohesive file) — overlay drawer with 15 collapsible sections, sidebar, focus trap, escape, body scroll lock, prefers-reduced-motion. Lab Results renders LOINC + reference range + interpretation flags + auto-detected condition callouts. ICD/SNOMED + AI confidence badges, FHIR validation badges, source provenance tags. PostHog events. "View Full Record" CTA on EHRPanel, gated by VITE_FULL_EHR_VIEW
#15 test(ehr): vitest + playwright for FullEHRDrawer 8 vitest component tests + 5 playwright e2e (3 always-on, 2 auth-gated skip cleanly)
#16 fix(ui): hide empty EHR sections + drop duplicate ring + Summary expand Drawer header drops the duplicate completeness ring; empty sections hidden by default with "Show N empty sections" toggle; Summary panel "+N more" became a chevron button
#17 feat(ehr): surface risk factors + hide About team section EHR panel risk section (top 3 + BLOCKING badge); Summary panel 1-line teaser (red when blocking); About page "Meet Our Team" hidden via {false && ...}

Operational findings (no code change)

Clerk dual-email investigation

  • Verified Vercel and Railway are both on pk_live_Y2xlcmsuY3VyYXdheS5haSQ (clerk.curaway.ai) — production keys are correctly configured.
  • Backend (curaway_src) has zero Clerk admin / webhook handlers — only JWT verification. Nothing in the code path can double-trigger an email.
  • Root cause is in the Clerk dashboard, not the code. Top 4 likely causes ranked, with #1 being "verification code + magic link both enabled".
  • Action item parked for SD: walk through the dashboard against the four ranked causes.

Audit of 4 prior items

Item Status
report_date extraction ⚠ Backend wired (clinical_extraction.py:31, 214, 221) and validator consumes it correctly. Not verified end-to-end with a fresh upload — Haiku may still occasionally return null.
Patient name decryption in EHR rebuild ✅ Wired in ehr_rebuild_service.py:69-71, ⚠ lazy. Backfills on next chat turn. Existing untouched cases still show old/null name. No backfill job.
First-name personalization sparingly ⚠ Prompt rule lives in llm_conversation.py:124-130. No test validates the LLM follows it — subjective check only.
creatingCaseRef cleared on first message ✅ Wired in ConversationApp.tsx:1375/1385/1612. Not manually verified in production.

Documented and parked for later

Conversation flow remediation — Layer 2 (PR #71)

Orchestrator planner refactor. Replace the if/elif tree in case_orchestrator.handle_message with a single Haiku call per turn that picks from a 12-action enum. Shadow-mode rollout (same pattern as matching strategies). ~$0.001/turn cost. New files: app/agents/planner.py, app/agents/planner_state.py. 4 test files. Pre-condition: Layer 1 deployed and observed for ≥1 full week.

Why parked: Layer 1 may be sufficient on its own. Want to measure Layer 1's impact in production before deciding whether the planner refactor is still warranted.

Conversation flow remediation — Layer 3 (PR #71)

Synchronous chat extractor. Move the extractor from the deferred lane back onto the routing critical path. Run in parallel with the input classifier via asyncio.gather so net latency cost is ~0ms. Adds an agent.reask event for quantitatively measuring the re-ask drop. New chat_extractor_sync flag (default true). Pre-condition: Layer 1. Independent of Layer 2 — can ship in any order.

Why parked: Same — measure Layer 1 first.

Audit follow-ups (not yet shipped)

  1. Playwright e2e for creatingCaseRef — auth-gated, asserts no flicker on first message
  2. Backend regression test for report_date extraction — feed a fixture with a known date, assert report_date is non-null. Catches Haiku regressions when prompts get shuffled.
  3. One-shot SQL backfill for patient name decryption on existing casesUPDATE ehr_snapshot ... for cases where name is null but patient.full_name_encrypted is set
  4. Skipped intentionally — first-name personalization has no cheap deterministic test; manual Langfuse review of 20 transcripts is the right tool

Other parked items

  • Authenticated Playwright tests for the Full EHR drawer — 2 tests skip cleanly until E2E_STORAGE_STATE + E2E_AUTH_CASE_ID env vars are wired in CI
  • Spec D6 in Layer 1 (quick_questions_asked flip ordering) — implicitly fixed by gates_v2 design (intake_answer_count only increments on real answers), so the explicit code change was skipped to reduce blast radius
  • End-to-end orchestrator test for gates_v2 — needs full DB fixture, helper-level tests cover the core logic; revisit if production behaviour surprises us
  • Rich card per-component file split for FullEHRDrawer — spec called for 17 separate component files; consolidated into 1 cohesive file per CLAUDE.md "minimum complexity" rule. Can split later if maintenance pressure builds.

Verification still needed

Not blocking, but worth doing in the next session:

  1. Layer 1 in production — visit /app/case/e241e34f-6ef8-4eee-8d54-cbb02451100c, send "find providers now", confirm matching triggers immediately
  2. Run pytest tests/test_gates_v2.py -v in Railway to confirm 24/24 green (skipped locally — no venv)
  3. Run pytest tests/test_risk_assessor.py -v in Railway to confirm 26/26 green (same reason)
  4. Watch Langfuse for gates_v2.* events and risk_factors data over the next 24 h
  5. Manually test EHR drawer on a real authenticated case after Vercel deploy of #17
  6. Double-check the dual-email Clerk dashboard against the 4 ranked causes