Streaming Enhancement — Feature Spec¶
Date: 2026-04-10 (Session 35)
Status: Phase 1 complete, Phases 2-4 spec'd
ADR: ADR-0015 Coordinator Interaction Model
Companion steer: ai-steer/coordinator-interaction-steer.md
Summary¶
Real-time streaming for the Curaway conversational UI. The architecture is SSE-based (Session 31), with WebSocket planned for coordinator interaction (Phase 3). This spec covers all phases from current state to global production.
Current State (as of Session 35)¶
| Component | Status | Detail |
|---|---|---|
| LLM token streaming | Enabled | generate_response_streaming() with llm.astream() |
| Redis pub/sub | Production | Upstash Redis list-based pattern (RPUSH/LPOP) |
| SSE — chat tokens | Enabled | GET /cases/{id}/chat/stream at 100ms polling |
| SSE — doc progress | Production | GET /patients/{id}/documents/stream at 1s polling |
| SSE — messages | Enabled | GET /cases/{id}/messages/stream at 3s polling |
| Frontend EventSource | Production | ConversationApp.tsx token + doc progress consumers |
sender_type on Message |
Deployed | ai \| patient \| coordinator — column added |
| Feature flag | Enabled | enable_response_streaming: true |
| Langfuse tracing | Compatible | Streaming traces captured via LangChain callback |
Polling frequencies¶
| Channel | Frequency | Rationale |
|---|---|---|
| Chat tokens | 100ms | Tokens arrive every 50-100ms from Claude. Smooth UX critical. |
| Doc progress | 1s | Pipeline steps take 2-30s. 1s catches transitions. |
| Messages | 3s | With streaming, poll is just the final commit. 3s invisible. |
Phase 1: Foundation (COMPLETE)¶
Goal: Real-time token streaming for AI ↔ Patient chat.
- [x]
generate_response_streaming()— astream-based token extraction - [x] Redis pub/sub — Upstash list pattern, non-blocking
- [x] SSE endpoint —
GET /cases/{id}/chat/stream - [x] Frontend EventSource consumer — parallel with POST
- [x] Feature flag —
enable_response_streaming - [x] Langfuse compatibility — streaming traces
- [x]
sender_typecolumn on Message model - [x] Polling optimization — doc 500ms→1s, messages 2s→3s
- [x]
capture="environment"on file input (mobile camera)
Phase 2: Reliability & Polish (PRE-LAUNCH)¶
Goal: Production-hardened streaming for global audience. Timeline: Before first patient cohort. Effort: ~2-3 days
2.1 Upstash Redis Pro upgrade¶
Priority: CRITICAL — free tier (10K commands/day) exhausted in minutes with SSE polling. Pro ($10/mo) gives 10K commands/second.
- [ ] Upgrade Upstash plan to Pro ($10/mo)
- [ ] Verify connection string unchanged (Upstash in-place upgrade)
- [ ] Monitor command rate via Upstash dashboard
2.2 Mobile SSE reliability¶
- [ ] Add
visibilitychangelistener to reconnect EventSource when mobile tab returns to foreground (iOS Safari kills background SSE) - [ ] Implement exponential backoff on EventSource
onerror(current: no retry logic, relies on browser auto-reconnect) - [ ] Test on iOS Safari, Android Chrome, Samsung Internet
File: src/pages/ConversationApp.tsx (EventSource setup ~line 1430)
2.3 Streaming cursor UX¶
- [ ] Show "thinking..." indicator between POST send and first token (currently: streaming cursor starts immediately, may flash if orchestrator takes 1-2s before LLM call)
- [ ] Smooth fade-in on first token instead of sudden appearance
- [ ] Add elapsed timer after 5s ("Still working..." at 10s)
File: src/pages/ConversationApp.tsx (MessageBubble component)
2.4 Error recovery¶
- [ ] If SSE connection drops mid-stream, show partial message with "Connection lost — tap to retry" instead of empty bubble
- [ ] If POST response arrives after SSE stream completes, reconcile (deduplicate — don't show message twice)
- [ ] If Redis is down, graceful fallback to non-streaming POST (already works — just needs test coverage)
Files: ConversationApp.tsx, cases.py SSE endpoint
2.5 Load testing¶
- [ ] Simulate 50 concurrent SSE connections using k6 or locust
- [ ] Measure: Redis commands/sec, Railway memory, response latency
- [ ] Verify: no message loss, no duplicate messages, clean disconnect
- [ ] Document results in
docs/architecture/performance.md
Implementation checklist — model tiers¶
Opus: - [ ] Error recovery design (2.4) — edge cases, dedup logic - [ ] Load test analysis + capacity recommendations (2.5)
Sonnet: - [ ] Upstash upgrade verification (2.1) - [ ] Mobile SSE reconnection (2.2) - [ ] Streaming cursor UX (2.3) - [ ] Load test script writing (2.5)
Phase 3: Coordinator Chat (POST-SEED)¶
Goal: Human coordinators interact with patients in real-time. Timeline: After seed funding, when coordinator team onboards. Effort: ~2-3 weeks
3.1 WebSocket infrastructure¶
- [ ] Add
websocketsto requirements.txt - [ ] Create
app/routers/ws.py— WebSocket endpoint atWS /api/v1/cases/{case_id}/ws - [ ] Connection manager: track connected clients per case
(
{case_id: [patient_ws, coordinator_ws]}) - [ ] Auth: Clerk JWT in first WS message (WebSocket doesn't support headers — token sent as JSON payload after connect)
- [ ] Heartbeat: server pings every 30s, client responds with pong
- [ ] Reconnection: client auto-reconnects with exponential backoff
3.2 Message routing¶
- [ ] Add
message.sourcefield:chat | phone | email | whatsapp(distinct fromsender_typewhich is who, not how) - [ ] Route messages to correct WebSocket connections based on
sender_type(patient sees all; coordinator sees all + internal) - [ ] Internal notes:
role=system, sender_type=coordinator— visible only to coordinators
3.3 Escalation flow¶
- [ ] New intent category in message classifier:
escalation(triggers: "talk to a person", "can someone help", "real person") - [ ] Escalation event emitted to coordinator dashboard
- [ ] AI pause: when coordinator accepts, AI stops auto-responding
- [ ] AI resume: coordinator explicitly releases the case
- [ ] Feature flag:
coordinator_enabled(default false)
3.4 Coordinator dashboard (thin v1)¶
- [ ] Route:
/admin/cases/{id}(separate from patient UI) - [ ] View: case summary + full chat thread + internal notes
- [ ] Actions: reply to patient, add internal note, update case status
- [ ] Auth: Clerk organization role check (coordinator role)
- [ ] No full admin panel — just case-level interaction
3.5 Frontend — coordinator messages in patient chat¶
- [ ] Render coordinator messages with staff badge (different from AI avatar)
- [ ] "Priya from Curaway" header on coordinator messages
- [ ] Typing indicator for coordinator ("Priya is typing...")
Implementation checklist — model tiers¶
Opus: - [ ] WebSocket infrastructure design (3.1) - [ ] Escalation flow + AI pause/resume logic (3.3) - [ ] Auth model for coordinator role (3.4)
Sonnet: - [ ] Connection manager (3.1) - [ ] Message routing (3.2) - [ ] Coordinator dashboard UI (3.4) - [ ] Staff badge rendering (3.5) - [ ] Tests
Phase 4: Multi-Channel (POST-SERIES A)¶
Goal: Unified case timeline across all communication channels. Timeline: After Series A, when patient volume justifies. Effort: ~4-6 weeks
4.1 Twilio integration¶
- [ ] Phone: Twilio Voice webhook → call summary logged to case
- [ ] WhatsApp: Twilio WhatsApp Business API → messages in conversation
- [ ] SMS: Twilio SMS → messages in conversation
- [ ] Bidirectional: coordinator can reply via WhatsApp from dashboard
4.2 Email integration¶
- [ ] Inbound email forwarding → parse attachments → trigger OCR pipeline
- [ ] Outbound email notifications (already built —
email_service.py) - [ ] Email thread linked to case timeline
4.3 Unified timeline¶
- [ ] Case timeline view mixing all channels:
chat | phone_call | whatsapp | email | internal_note - [ ] Each entry: timestamp, sender, channel icon, content preview
- [ ] Filterable by channel type
4.4 Push notifications¶
- [ ] PWA support (service worker + manifest)
- [ ] Web Push API for browser notifications
- [ ] Notification preferences: email, push, SMS, WhatsApp
Capacity Planning¶
| Phase | Monthly cost | Concurrent users | Bottleneck |
|---|---|---|---|
| Phase 1 (current) | $20 | ~7 | Upstash Redis free tier |
| Phase 2 (Redis Pro) | $30 | ~500-1000 | PostgreSQL pool |
| Phase 3 (coordinator) | $50 | ~1000 | Railway compute |
| Phase 4 (multi-channel) | $200+ | ~5000 | Twilio costs |
References¶
- ADR-0015: Coordinator Interaction Model
- Session 31: Streaming implementation
- Session 35: Architecture assessment, polling optimization
app/agents/llm_conversation.py—generate_response_streaming()app/integrations/redis_pubsub.py— pub/sub infrastructureapp/routers/cases.py:935— chat token SSE endpointapp/routers/documents.py:286— doc progress SSE endpoint