2026-04-07 — Risk assessor, conversation flow gates v2, full EHR drawer, voice fixes¶

Author: Srikanth Donthi (CPO/CTO) + Claude Code session Theme: Patient-facing conversation polish, EHR rich UI, two prod outages caught and fixed, three-layer plan to fix the broken conversation flow with Layer 1 shipped and Layers 2/3 documented for later

Headline numbers¶

15 PRs merged (8 backend, 5 frontend, 2 docs)
~50 unit tests added (24 gates_v2 + 26 risk_assessor)
8 new Vitest + 5 new Playwright tests on the frontend
3 conversation-flow remediation layers documented (1 shipped, 2 parked with full specs ready to execute)
2 prod outages caught and fixed within a single session (SSE tenant routing, voice prompt template leak)

Shipped to production¶

Backend (`curaway-ai/curaway-backend`)¶

PR	Title	What it does
#64	feat(sse): require Clerk JWT on SSE endpoints	`verify_sse_token` + `claims_to_tenant_id` helpers; `documents/stream`, `cases chat/stream`, `cases messages/stream` all gated; on-site tests notice rich card on the forwarding step
#65	fix(voice): stop agent assuming "abroad" + clinical labeling	Removed leaky `[symptom] bad enough to need surgery` template; added `NEVER ASSUME "ABROAD"` rule; 4 new forbidden phrases in `voice_rules.yaml`
#66	hotfix(sse): query tenant_id wins over Clerk org_id	Prod outage fix — `INTAKE_PATIENT_NOT_FOUND_001`. Inverted precedence in `claims_to_tenant_id` so caller fallback wins; Clerk org_id only used when no caller value supplied. Patient rows are partitioned under `tenant-apollo-001`, not under Clerk org IDs (yet)
#67	fix(agent): records-first uses real Neo4j tests + validity windows	Wired `case_service.get_procedure_requirements()` into the records-first LLM call; new `_format_required_records_block` helper; strengthened `records_first` phase prompt to forbid invented validity periods
#68	feat(ehr): rule-based pre-operative risk assessor	New `app/services/risk_assessor.py` with rules for age, comorbidities, anticoagulants, immunosuppressants, NSAIDs, HbA1c, Hgb, eGFR, INR, platelets. `is_blocking` flag for risks that should pause forwarding. Wired into `ehr_builder_agent`. 26 unit tests.
#69	docs: conversation flow gates v2 — steer + feature spec	Layer 1 design + implementation spec
#70	fix(orchestrator): conversation flow gates v2 (Layer 1)	New `_intake_complete_v2` 5-condition gate, `is_explicit_advance_phrase` helper, lowered `completeness_for_matching` 0.5 → 0.4, `allow_when_intake_complete`, `gates_v2` flag (default on), 24 unit tests, 3 new `gates_v2.*` events for observability
#71	docs: conversation flow layers 2 + 3 — steer + feature specs	Orchestrator planner refactor + synchronous chat extractor — design + spec only

Frontend (`curaway-ai/curaway-frontend`)¶

PR	Title	What it does
#13	fix(sse): pass Clerk JWT on SSE URLs + render on-site tests notice	`?token=<clerk-jwt>` on all 3 EventSource URLs in `ConversationApp.tsx` + `UploadRecords.tsx`; confirmation rich card renders `on_site_tests_notice` sub-card; new `e2e/sse-auth.spec.ts`
#14	feat(ehr): full EHR view drawer	New `FullEHRDrawer` (1184 LOC, single cohesive file) — overlay drawer with 15 collapsible sections, sidebar, focus trap, escape, body scroll lock, prefers-reduced-motion. Lab Results renders LOINC + reference range + interpretation flags + auto-detected condition callouts. ICD/SNOMED + AI confidence badges, FHIR validation badges, source provenance tags. PostHog events. "View Full Record" CTA on `EHRPanel`, gated by `VITE_FULL_EHR_VIEW`
#15	test(ehr): vitest + playwright for FullEHRDrawer	8 vitest component tests + 5 playwright e2e (3 always-on, 2 auth-gated skip cleanly)
#16	fix(ui): hide empty EHR sections + drop duplicate ring + Summary expand	Drawer header drops the duplicate completeness ring; empty sections hidden by default with "Show N empty sections" toggle; Summary panel "+N more" became a chevron button
#17	feat(ehr): surface risk factors + hide About team section	EHR panel risk section (top 3 + BLOCKING badge); Summary panel 1-line teaser (red when blocking); About page "Meet Our Team" hidden via `{false && ...}`

Operational findings (no code change)¶

Clerk dual-email investigation

Verified Vercel and Railway are both on pk_live_Y2xlcmsuY3VyYXdheS5haSQ (clerk.curaway.ai) — production keys are correctly configured.
Backend (curaway_src) has zero Clerk admin / webhook handlers — only JWT verification. Nothing in the code path can double-trigger an email.
Root cause is in the Clerk dashboard, not the code. Top 4 likely causes ranked, with #1 being "verification code + magic link both enabled".
Action item parked for SD: walk through the dashboard against the four ranked causes.

Audit of 4 prior items

Item	Status
`report_date` extraction	⚠ Backend wired (`clinical_extraction.py:31, 214, 221`) and validator consumes it correctly. Not verified end-to-end with a fresh upload — Haiku may still occasionally return null.
Patient name decryption in EHR rebuild	✅ Wired in `ehr_rebuild_service.py:69-71`, ⚠ lazy. Backfills on next chat turn. Existing untouched cases still show old/null name. No backfill job.
First-name personalization sparingly	⚠ Prompt rule lives in `llm_conversation.py:124-130`. No test validates the LLM follows it — subjective check only.
`creatingCaseRef` cleared on first message	✅ Wired in `ConversationApp.tsx:1375/1385/1612`. Not manually verified in production.

Documented and parked for later¶

Conversation flow remediation — Layer 2 (PR #71)¶

Orchestrator planner refactor. Replace the if/elif tree in case_orchestrator.handle_message with a single Haiku call per turn that picks from a 12-action enum. Shadow-mode rollout (same pattern as matching strategies). ~$0.001/turn cost. New files: app/agents/planner.py, app/agents/planner_state.py. 4 test files. Pre-condition: Layer 1 deployed and observed for ≥1 full week.

Why parked: Layer 1 may be sufficient on its own. Want to measure Layer 1's impact in production before deciding whether the planner refactor is still warranted.

Conversation flow remediation — Layer 3 (PR #71)¶

Synchronous chat extractor. Move the extractor from the deferred lane back onto the routing critical path. Run in parallel with the input classifier via asyncio.gather so net latency cost is ~0ms. Adds an agent.reask event for quantitatively measuring the re-ask drop. New chat_extractor_sync flag (default true). Pre-condition: Layer 1. Independent of Layer 2 — can ship in any order.

Why parked: Same — measure Layer 1 first.

Audit follow-ups (not yet shipped)¶

Playwright e2e for creatingCaseRef — auth-gated, asserts no flicker on first message
Backend regression test for report_date extraction — feed a fixture with a known date, assert report_date is non-null. Catches Haiku regressions when prompts get shuffled.
One-shot SQL backfill for patient name decryption on existing cases — UPDATE ehr_snapshot ... for cases where name is null but patient.full_name_encrypted is set
Skipped intentionally — first-name personalization has no cheap deterministic test; manual Langfuse review of 20 transcripts is the right tool

Other parked items¶

Authenticated Playwright tests for the Full EHR drawer — 2 tests skip cleanly until E2E_STORAGE_STATE + E2E_AUTH_CASE_ID env vars are wired in CI
Spec D6 in Layer 1 (quick_questions_asked flip ordering) — implicitly fixed by gates_v2 design (intake_answer_count only increments on real answers), so the explicit code change was skipped to reduce blast radius
End-to-end orchestrator test for gates_v2 — needs full DB fixture, helper-level tests cover the core logic; revisit if production behaviour surprises us
Rich card per-component file split for FullEHRDrawer — spec called for 17 separate component files; consolidated into 1 cohesive file per CLAUDE.md "minimum complexity" rule. Can split later if maintenance pressure builds.

Verification still needed¶

Not blocking, but worth doing in the next session:

Layer 1 in production — visit /app/case/e241e34f-6ef8-4eee-8d54-cbb02451100c, send "find providers now", confirm matching triggers immediately
Run pytest tests/test_gates_v2.py -v in Railway to confirm 24/24 green (skipped locally — no venv)
Run pytest tests/test_risk_assessor.py -v in Railway to confirm 26/26 green (same reason)
Watch Langfuse for gates_v2.* events and risk_factors data over the next 24 h
Manually test EHR drawer on a real authenticated case after Vercel deploy of #17
Double-check the dual-email Clerk dashboard against the 4 ranked causes