Skip to content

Streaming Enhancement — Feature Spec

Date: 2026-04-10 (Session 35) Status: Phase 1 complete, Phases 2-4 spec'd ADR: ADR-0015 Coordinator Interaction Model Companion steer: ai-steer/coordinator-interaction-steer.md


Summary

Real-time streaming for the Curaway conversational UI. The architecture is SSE-based (Session 31), with WebSocket planned for coordinator interaction (Phase 3). This spec covers all phases from current state to global production.


Current State (as of Session 35)

Component Status Detail
LLM token streaming Enabled generate_response_streaming() with llm.astream()
Redis pub/sub Production Upstash Redis list-based pattern (RPUSH/LPOP)
SSE — chat tokens Enabled GET /cases/{id}/chat/stream at 100ms polling
SSE — doc progress Production GET /patients/{id}/documents/stream at 1s polling
SSE — messages Enabled GET /cases/{id}/messages/stream at 3s polling
Frontend EventSource Production ConversationApp.tsx token + doc progress consumers
sender_type on Message Deployed ai \| patient \| coordinator — column added
Feature flag Enabled enable_response_streaming: true
Langfuse tracing Compatible Streaming traces captured via LangChain callback

Polling frequencies

Channel Frequency Rationale
Chat tokens 100ms Tokens arrive every 50-100ms from Claude. Smooth UX critical.
Doc progress 1s Pipeline steps take 2-30s. 1s catches transitions.
Messages 3s With streaming, poll is just the final commit. 3s invisible.

Phase 1: Foundation (COMPLETE)

Goal: Real-time token streaming for AI ↔ Patient chat.

  • [x] generate_response_streaming() — astream-based token extraction
  • [x] Redis pub/sub — Upstash list pattern, non-blocking
  • [x] SSE endpoint — GET /cases/{id}/chat/stream
  • [x] Frontend EventSource consumer — parallel with POST
  • [x] Feature flag — enable_response_streaming
  • [x] Langfuse compatibility — streaming traces
  • [x] sender_type column on Message model
  • [x] Polling optimization — doc 500ms→1s, messages 2s→3s
  • [x] capture="environment" on file input (mobile camera)

Phase 2: Reliability & Polish (PRE-LAUNCH)

Goal: Production-hardened streaming for global audience. Timeline: Before first patient cohort. Effort: ~2-3 days

2.1 Upstash Redis Pro upgrade

Priority: CRITICAL — free tier (10K commands/day) exhausted in minutes with SSE polling. Pro ($10/mo) gives 10K commands/second.

  • [ ] Upgrade Upstash plan to Pro ($10/mo)
  • [ ] Verify connection string unchanged (Upstash in-place upgrade)
  • [ ] Monitor command rate via Upstash dashboard

2.2 Mobile SSE reliability

  • [ ] Add visibilitychange listener to reconnect EventSource when mobile tab returns to foreground (iOS Safari kills background SSE)
  • [ ] Implement exponential backoff on EventSource onerror (current: no retry logic, relies on browser auto-reconnect)
  • [ ] Test on iOS Safari, Android Chrome, Samsung Internet

File: src/pages/ConversationApp.tsx (EventSource setup ~line 1430)

2.3 Streaming cursor UX

  • [ ] Show "thinking..." indicator between POST send and first token (currently: streaming cursor starts immediately, may flash if orchestrator takes 1-2s before LLM call)
  • [ ] Smooth fade-in on first token instead of sudden appearance
  • [ ] Add elapsed timer after 5s ("Still working..." at 10s)

File: src/pages/ConversationApp.tsx (MessageBubble component)

2.4 Error recovery

  • [ ] If SSE connection drops mid-stream, show partial message with "Connection lost — tap to retry" instead of empty bubble
  • [ ] If POST response arrives after SSE stream completes, reconcile (deduplicate — don't show message twice)
  • [ ] If Redis is down, graceful fallback to non-streaming POST (already works — just needs test coverage)

Files: ConversationApp.tsx, cases.py SSE endpoint

2.5 Load testing

  • [ ] Simulate 50 concurrent SSE connections using k6 or locust
  • [ ] Measure: Redis commands/sec, Railway memory, response latency
  • [ ] Verify: no message loss, no duplicate messages, clean disconnect
  • [ ] Document results in docs/architecture/performance.md

Implementation checklist — model tiers

Opus: - [ ] Error recovery design (2.4) — edge cases, dedup logic - [ ] Load test analysis + capacity recommendations (2.5)

Sonnet: - [ ] Upstash upgrade verification (2.1) - [ ] Mobile SSE reconnection (2.2) - [ ] Streaming cursor UX (2.3) - [ ] Load test script writing (2.5)


Phase 3: Coordinator Chat (POST-SEED)

Goal: Human coordinators interact with patients in real-time. Timeline: After seed funding, when coordinator team onboards. Effort: ~2-3 weeks

3.1 WebSocket infrastructure

  • [ ] Add websockets to requirements.txt
  • [ ] Create app/routers/ws.py — WebSocket endpoint at WS /api/v1/cases/{case_id}/ws
  • [ ] Connection manager: track connected clients per case ({case_id: [patient_ws, coordinator_ws]})
  • [ ] Auth: Clerk JWT in first WS message (WebSocket doesn't support headers — token sent as JSON payload after connect)
  • [ ] Heartbeat: server pings every 30s, client responds with pong
  • [ ] Reconnection: client auto-reconnects with exponential backoff

3.2 Message routing

  • [ ] Add message.source field: chat | phone | email | whatsapp (distinct from sender_type which is who, not how)
  • [ ] Route messages to correct WebSocket connections based on sender_type (patient sees all; coordinator sees all + internal)
  • [ ] Internal notes: role=system, sender_type=coordinator — visible only to coordinators

3.3 Escalation flow

  • [ ] New intent category in message classifier: escalation (triggers: "talk to a person", "can someone help", "real person")
  • [ ] Escalation event emitted to coordinator dashboard
  • [ ] AI pause: when coordinator accepts, AI stops auto-responding
  • [ ] AI resume: coordinator explicitly releases the case
  • [ ] Feature flag: coordinator_enabled (default false)

3.4 Coordinator dashboard (thin v1)

  • [ ] Route: /admin/cases/{id} (separate from patient UI)
  • [ ] View: case summary + full chat thread + internal notes
  • [ ] Actions: reply to patient, add internal note, update case status
  • [ ] Auth: Clerk organization role check (coordinator role)
  • [ ] No full admin panel — just case-level interaction

3.5 Frontend — coordinator messages in patient chat

  • [ ] Render coordinator messages with staff badge (different from AI avatar)
  • [ ] "Priya from Curaway" header on coordinator messages
  • [ ] Typing indicator for coordinator ("Priya is typing...")

Implementation checklist — model tiers

Opus: - [ ] WebSocket infrastructure design (3.1) - [ ] Escalation flow + AI pause/resume logic (3.3) - [ ] Auth model for coordinator role (3.4)

Sonnet: - [ ] Connection manager (3.1) - [ ] Message routing (3.2) - [ ] Coordinator dashboard UI (3.4) - [ ] Staff badge rendering (3.5) - [ ] Tests


Phase 4: Multi-Channel (POST-SERIES A)

Goal: Unified case timeline across all communication channels. Timeline: After Series A, when patient volume justifies. Effort: ~4-6 weeks

4.1 Twilio integration

  • [ ] Phone: Twilio Voice webhook → call summary logged to case
  • [ ] WhatsApp: Twilio WhatsApp Business API → messages in conversation
  • [ ] SMS: Twilio SMS → messages in conversation
  • [ ] Bidirectional: coordinator can reply via WhatsApp from dashboard

4.2 Email integration

  • [ ] Inbound email forwarding → parse attachments → trigger OCR pipeline
  • [ ] Outbound email notifications (already built — email_service.py)
  • [ ] Email thread linked to case timeline

4.3 Unified timeline

  • [ ] Case timeline view mixing all channels: chat | phone_call | whatsapp | email | internal_note
  • [ ] Each entry: timestamp, sender, channel icon, content preview
  • [ ] Filterable by channel type

4.4 Push notifications

  • [ ] PWA support (service worker + manifest)
  • [ ] Web Push API for browser notifications
  • [ ] Notification preferences: email, push, SMS, WhatsApp

Capacity Planning

Phase Monthly cost Concurrent users Bottleneck
Phase 1 (current) $20 ~7 Upstash Redis free tier
Phase 2 (Redis Pro) $30 ~500-1000 PostgreSQL pool
Phase 3 (coordinator) $50 ~1000 Railway compute
Phase 4 (multi-channel) $200+ ~5000 Twilio costs

References

  • ADR-0015: Coordinator Interaction Model
  • Session 31: Streaming implementation
  • Session 35: Architecture assessment, polling optimization
  • app/agents/llm_conversation.pygenerate_response_streaming()
  • app/integrations/redis_pubsub.py — pub/sub infrastructure
  • app/routers/cases.py:935 — chat token SSE endpoint
  • app/routers/documents.py:286 — doc progress SSE endpoint