2026-04-07 — Risk assessor, conversation flow gates v2, full EHR drawer, voice fixes¶
Author: Srikanth Donthi (CPO/CTO) + Claude Code session Theme: Patient-facing conversation polish, EHR rich UI, two prod outages caught and fixed, three-layer plan to fix the broken conversation flow with Layer 1 shipped and Layers 2/3 documented for later
Headline numbers¶
- 15 PRs merged (8 backend, 5 frontend, 2 docs)
- ~50 unit tests added (24
gates_v2+ 26risk_assessor) - 8 new Vitest + 5 new Playwright tests on the frontend
- 3 conversation-flow remediation layers documented (1 shipped, 2 parked with full specs ready to execute)
- 2 prod outages caught and fixed within a single session (SSE tenant routing, voice prompt template leak)
Shipped to production¶
Backend (curaway-ai/curaway-backend)¶
| PR | Title | What it does |
|---|---|---|
| #64 | feat(sse): require Clerk JWT on SSE endpoints | verify_sse_token + claims_to_tenant_id helpers; documents/stream, cases chat/stream, cases messages/stream all gated; on-site tests notice rich card on the forwarding step |
| #65 | fix(voice): stop agent assuming "abroad" + clinical labeling | Removed leaky [symptom] bad enough to need surgery template; added NEVER ASSUME "ABROAD" rule; 4 new forbidden phrases in voice_rules.yaml |
| #66 | hotfix(sse): query tenant_id wins over Clerk org_id | Prod outage fix — INTAKE_PATIENT_NOT_FOUND_001. Inverted precedence in claims_to_tenant_id so caller fallback wins; Clerk org_id only used when no caller value supplied. Patient rows are partitioned under tenant-apollo-001, not under Clerk org IDs (yet) |
| #67 | fix(agent): records-first uses real Neo4j tests + validity windows | Wired case_service.get_procedure_requirements() into the records-first LLM call; new _format_required_records_block helper; strengthened records_first phase prompt to forbid invented validity periods |
| #68 | feat(ehr): rule-based pre-operative risk assessor | New app/services/risk_assessor.py with rules for age, comorbidities, anticoagulants, immunosuppressants, NSAIDs, HbA1c, Hgb, eGFR, INR, platelets. is_blocking flag for risks that should pause forwarding. Wired into ehr_builder_agent. 26 unit tests. |
| #69 | docs: conversation flow gates v2 — steer + feature spec | Layer 1 design + implementation spec |
| #70 | fix(orchestrator): conversation flow gates v2 (Layer 1) | New _intake_complete_v2 5-condition gate, is_explicit_advance_phrase helper, lowered completeness_for_matching 0.5 → 0.4, allow_when_intake_complete, gates_v2 flag (default on), 24 unit tests, 3 new gates_v2.* events for observability |
| #71 | docs: conversation flow layers 2 + 3 — steer + feature specs | Orchestrator planner refactor + synchronous chat extractor — design + spec only |
Frontend (curaway-ai/curaway-frontend)¶
| PR | Title | What it does |
|---|---|---|
| #13 | fix(sse): pass Clerk JWT on SSE URLs + render on-site tests notice | ?token=<clerk-jwt> on all 3 EventSource URLs in ConversationApp.tsx + UploadRecords.tsx; confirmation rich card renders on_site_tests_notice sub-card; new e2e/sse-auth.spec.ts |
| #14 | feat(ehr): full EHR view drawer | New FullEHRDrawer (1184 LOC, single cohesive file) — overlay drawer with 15 collapsible sections, sidebar, focus trap, escape, body scroll lock, prefers-reduced-motion. Lab Results renders LOINC + reference range + interpretation flags + auto-detected condition callouts. ICD/SNOMED + AI confidence badges, FHIR validation badges, source provenance tags. PostHog events. "View Full Record" CTA on EHRPanel, gated by VITE_FULL_EHR_VIEW |
| #15 | test(ehr): vitest + playwright for FullEHRDrawer | 8 vitest component tests + 5 playwright e2e (3 always-on, 2 auth-gated skip cleanly) |
| #16 | fix(ui): hide empty EHR sections + drop duplicate ring + Summary expand | Drawer header drops the duplicate completeness ring; empty sections hidden by default with "Show N empty sections" toggle; Summary panel "+N more" became a chevron button |
| #17 | feat(ehr): surface risk factors + hide About team section | EHR panel risk section (top 3 + BLOCKING badge); Summary panel 1-line teaser (red when blocking); About page "Meet Our Team" hidden via {false && ...} |
Operational findings (no code change)¶
Clerk dual-email investigation
- Verified Vercel and Railway are both on
pk_live_Y2xlcmsuY3VyYXdheS5haSQ(clerk.curaway.ai) — production keys are correctly configured. - Backend (
curaway_src) has zero Clerk admin / webhook handlers — only JWT verification. Nothing in the code path can double-trigger an email. - Root cause is in the Clerk dashboard, not the code. Top 4 likely causes ranked, with #1 being "verification code + magic link both enabled".
- Action item parked for SD: walk through the dashboard against the four ranked causes.
Audit of 4 prior items
| Item | Status |
|---|---|
report_date extraction |
⚠ Backend wired (clinical_extraction.py:31, 214, 221) and validator consumes it correctly. Not verified end-to-end with a fresh upload — Haiku may still occasionally return null. |
| Patient name decryption in EHR rebuild | ✅ Wired in ehr_rebuild_service.py:69-71, ⚠ lazy. Backfills on next chat turn. Existing untouched cases still show old/null name. No backfill job. |
| First-name personalization sparingly | ⚠ Prompt rule lives in llm_conversation.py:124-130. No test validates the LLM follows it — subjective check only. |
creatingCaseRef cleared on first message |
✅ Wired in ConversationApp.tsx:1375/1385/1612. Not manually verified in production. |
Documented and parked for later¶
Conversation flow remediation — Layer 2 (PR #71)¶
Orchestrator planner refactor. Replace the if/elif tree in
case_orchestrator.handle_message with a single Haiku call per turn that
picks from a 12-action enum. Shadow-mode rollout (same pattern as matching
strategies). ~$0.001/turn cost. New files: app/agents/planner.py,
app/agents/planner_state.py. 4 test files. Pre-condition: Layer 1
deployed and observed for ≥1 full week.
Why parked: Layer 1 may be sufficient on its own. Want to measure Layer 1's impact in production before deciding whether the planner refactor is still warranted.
Conversation flow remediation — Layer 3 (PR #71)¶
Synchronous chat extractor. Move the extractor from the deferred lane
back onto the routing critical path. Run in parallel with the input
classifier via asyncio.gather so net latency cost is ~0ms. Adds an
agent.reask event for quantitatively measuring the re-ask drop. New
chat_extractor_sync flag (default true). Pre-condition: Layer 1.
Independent of Layer 2 — can ship in any order.
Why parked: Same — measure Layer 1 first.
Audit follow-ups (not yet shipped)¶
- Playwright e2e for
creatingCaseRef— auth-gated, asserts no flicker on first message - Backend regression test for
report_dateextraction — feed a fixture with a known date, assertreport_dateis non-null. Catches Haiku regressions when prompts get shuffled. - One-shot SQL backfill for patient name decryption on existing cases
—
UPDATE ehr_snapshot ...for cases wherenameis null butpatient.full_name_encryptedis set - Skipped intentionally — first-name personalization has no cheap deterministic test; manual Langfuse review of 20 transcripts is the right tool
Other parked items¶
- Authenticated Playwright tests for the Full EHR drawer — 2 tests skip
cleanly until
E2E_STORAGE_STATE+E2E_AUTH_CASE_IDenv vars are wired in CI - Spec D6 in Layer 1 (
quick_questions_askedflip ordering) — implicitly fixed bygates_v2design (intake_answer_countonly increments on real answers), so the explicit code change was skipped to reduce blast radius - End-to-end orchestrator test for
gates_v2— needs full DB fixture, helper-level tests cover the core logic; revisit if production behaviour surprises us - Rich card per-component file split for
FullEHRDrawer— spec called for 17 separate component files; consolidated into 1 cohesive file per CLAUDE.md "minimum complexity" rule. Can split later if maintenance pressure builds.
Verification still needed¶
Not blocking, but worth doing in the next session:
- Layer 1 in production — visit
/app/case/e241e34f-6ef8-4eee-8d54-cbb02451100c, send "find providers now", confirm matching triggers immediately - Run
pytest tests/test_gates_v2.py -vin Railway to confirm 24/24 green (skipped locally — no venv) - Run
pytest tests/test_risk_assessor.py -vin Railway to confirm 26/26 green (same reason) - Watch Langfuse for
gates_v2.*events andrisk_factorsdata over the next 24 h - Manually test EHR drawer on a real authenticated case after Vercel deploy of #17
- Double-check the dual-email Clerk dashboard against the 4 ranked causes