Streaming Enhancement — Feature Spec¶

Date: 2026-04-10 (Session 35) Status: Phase 1 complete, Phases 2-4 spec'd ADR: ADR-0015 Coordinator Interaction Model Companion steer: ai-steer/coordinator-interaction-steer.md

Summary¶

Real-time streaming for the Curaway conversational UI. The architecture is SSE-based (Session 31), with WebSocket planned for coordinator interaction (Phase 3). This spec covers all phases from current state to global production.

Current State (as of Session 35)¶

Component	Status	Detail
LLM token streaming	Enabled	`generate_response_streaming()` with `llm.astream()`
Redis pub/sub	Production	Upstash Redis list-based pattern (RPUSH/LPOP)
SSE — chat tokens	Enabled	`GET /cases/{id}/chat/stream` at 100ms polling
SSE — doc progress	Production	`GET /patients/{id}/documents/stream` at 1s polling
SSE — messages	Enabled	`GET /cases/{id}/messages/stream` at 3s polling
Frontend EventSource	Production	`ConversationApp.tsx` token + doc progress consumers
`sender_type` on Message	Deployed	`ai \\| patient \\| coordinator` — column added
Feature flag	Enabled	`enable_response_streaming: true`
Langfuse tracing	Compatible	Streaming traces captured via LangChain callback

Polling frequencies¶

Channel	Frequency	Rationale
Chat tokens	100ms	Tokens arrive every 50-100ms from Claude. Smooth UX critical.
Doc progress	1s	Pipeline steps take 2-30s. 1s catches transitions.
Messages	3s	With streaming, poll is just the final commit. 3s invisible.

Phase 1: Foundation (COMPLETE)¶

Goal: Real-time token streaming for AI ↔ Patient chat.

[x] generate_response_streaming() — astream-based token extraction
[x] Redis pub/sub — Upstash list pattern, non-blocking
[x] SSE endpoint — GET /cases/{id}/chat/stream
[x] Frontend EventSource consumer — parallel with POST
[x] Feature flag — enable_response_streaming
[x] Langfuse compatibility — streaming traces
[x] sender_type column on Message model
[x] Polling optimization — doc 500ms→1s, messages 2s→3s
[x] capture="environment" on file input (mobile camera)

Phase 2: Reliability & Polish (PRE-LAUNCH)¶

Goal: Production-hardened streaming for global audience. Timeline: Before first patient cohort. Effort: ~2-3 days

2.1 Upstash Redis Pro upgrade¶

Priority: CRITICAL — free tier (10K commands/day) exhausted in minutes with SSE polling. Pro ($10/mo) gives 10K commands/second.

[ ] Upgrade Upstash plan to Pro ($10/mo)
[ ] Verify connection string unchanged (Upstash in-place upgrade)
[ ] Monitor command rate via Upstash dashboard

2.2 Mobile SSE reliability¶

[ ] Add visibilitychange listener to reconnect EventSource when mobile tab returns to foreground (iOS Safari kills background SSE)
[ ] Implement exponential backoff on EventSource onerror (current: no retry logic, relies on browser auto-reconnect)
[ ] Test on iOS Safari, Android Chrome, Samsung Internet

File: src/pages/ConversationApp.tsx (EventSource setup ~line 1430)

2.3 Streaming cursor UX¶

[ ] Show "thinking..." indicator between POST send and first token (currently: streaming cursor starts immediately, may flash if orchestrator takes 1-2s before LLM call)
[ ] Smooth fade-in on first token instead of sudden appearance
[ ] Add elapsed timer after 5s ("Still working..." at 10s)

File: src/pages/ConversationApp.tsx (MessageBubble component)

2.4 Error recovery¶

[ ] If SSE connection drops mid-stream, show partial message with "Connection lost — tap to retry" instead of empty bubble
[ ] If POST response arrives after SSE stream completes, reconcile (deduplicate — don't show message twice)
[ ] If Redis is down, graceful fallback to non-streaming POST (already works — just needs test coverage)

Files: ConversationApp.tsx, cases.py SSE endpoint

2.5 Load testing¶

[ ] Simulate 50 concurrent SSE connections using k6 or locust
[ ] Measure: Redis commands/sec, Railway memory, response latency
[ ] Verify: no message loss, no duplicate messages, clean disconnect
[ ] Document results in docs/architecture/performance.md

Implementation checklist — model tiers¶

Opus: - [ ] Error recovery design (2.4) — edge cases, dedup logic - [ ] Load test analysis + capacity recommendations (2.5)

Sonnet: - [ ] Upstash upgrade verification (2.1) - [ ] Mobile SSE reconnection (2.2) - [ ] Streaming cursor UX (2.3) - [ ] Load test script writing (2.5)

Phase 3: Coordinator Chat (POST-SEED)¶

Goal: Human coordinators interact with patients in real-time. Timeline: After seed funding, when coordinator team onboards. Effort: ~2-3 weeks

3.1 WebSocket infrastructure¶

[ ] Add websockets to requirements.txt
[ ] Create app/routers/ws.py — WebSocket endpoint at WS /api/v1/cases/{case_id}/ws
[ ] Connection manager: track connected clients per case ({case_id: [patient_ws, coordinator_ws]})
[ ] Auth: Clerk JWT in first WS message (WebSocket doesn't support headers — token sent as JSON payload after connect)
[ ] Heartbeat: server pings every 30s, client responds with pong
[ ] Reconnection: client auto-reconnects with exponential backoff

3.2 Message routing¶

[ ] Add message.source field: chat | phone | email | whatsapp (distinct from sender_type which is who, not how)
[ ] Route messages to correct WebSocket connections based on sender_type (patient sees all; coordinator sees all + internal)
[ ] Internal notes: role=system, sender_type=coordinator — visible only to coordinators

3.3 Escalation flow¶

[ ] New intent category in message classifier: escalation (triggers: "talk to a person", "can someone help", "real person")
[ ] Escalation event emitted to coordinator dashboard
[ ] AI pause: when coordinator accepts, AI stops auto-responding
[ ] AI resume: coordinator explicitly releases the case
[ ] Feature flag: coordinator_enabled (default false)

3.4 Coordinator dashboard (thin v1)¶

[ ] Route: /admin/cases/{id} (separate from patient UI)
[ ] View: case summary + full chat thread + internal notes
[ ] Actions: reply to patient, add internal note, update case status
[ ] Auth: Clerk organization role check (coordinator role)
[ ] No full admin panel — just case-level interaction

3.5 Frontend — coordinator messages in patient chat¶

[ ] Render coordinator messages with staff badge (different from AI avatar)
[ ] "Priya from Curaway" header on coordinator messages
[ ] Typing indicator for coordinator ("Priya is typing...")

Implementation checklist — model tiers¶

Opus: - [ ] WebSocket infrastructure design (3.1) - [ ] Escalation flow + AI pause/resume logic (3.3) - [ ] Auth model for coordinator role (3.4)

Sonnet: - [ ] Connection manager (3.1) - [ ] Message routing (3.2) - [ ] Coordinator dashboard UI (3.4) - [ ] Staff badge rendering (3.5) - [ ] Tests

Phase 4: Multi-Channel (POST-SERIES A)¶

Goal: Unified case timeline across all communication channels. Timeline: After Series A, when patient volume justifies. Effort: ~4-6 weeks

4.1 Twilio integration¶

[ ] Phone: Twilio Voice webhook → call summary logged to case
[ ] WhatsApp: Twilio WhatsApp Business API → messages in conversation
[ ] SMS: Twilio SMS → messages in conversation
[ ] Bidirectional: coordinator can reply via WhatsApp from dashboard

4.2 Email integration¶

[ ] Inbound email forwarding → parse attachments → trigger OCR pipeline
[ ] Outbound email notifications (already built — email_service.py)
[ ] Email thread linked to case timeline

4.3 Unified timeline¶

[ ] Case timeline view mixing all channels: chat | phone_call | whatsapp | email | internal_note
[ ] Each entry: timestamp, sender, channel icon, content preview
[ ] Filterable by channel type

4.4 Push notifications¶

[ ] PWA support (service worker + manifest)
[ ] Web Push API for browser notifications
[ ] Notification preferences: email, push, SMS, WhatsApp

Capacity Planning¶

Phase	Monthly cost	Concurrent users	Bottleneck
Phase 1 (current)	$20	~7	Upstash Redis free tier
Phase 2 (Redis Pro)	$30	~500-1000	PostgreSQL pool
Phase 3 (coordinator)	$50	~1000	Railway compute
Phase 4 (multi-channel)	$200+	~5000	Twilio costs

References¶

ADR-0015: Coordinator Interaction Model
Session 31: Streaming implementation
Session 35: Architecture assessment, polling optimization
app/agents/llm_conversation.py — generate_response_streaming()
app/integrations/redis_pubsub.py — pub/sub infrastructure
app/routers/cases.py:935 — chat token SSE endpoint
app/routers/documents.py:286 — doc progress SSE endpoint