Skip to content

ADR-0004: SSE over Polling for Real-Time Chat

Status: Accepted Date: 2026-03-24 Session: 23B

Context

Curaway's chat interface needs to deliver agent responses to the frontend in real time. When a patient sends a message, the backend orchestrator processes it (potentially calling multiple sub-agents and external APIs), and the response must stream back to the browser as it is generated. The delivery mechanism must work with Railway's infrastructure, the Next.js frontend on Vercel, and handle cross-origin requests.

Decision

Use Server-Sent Events (SSE) backed by Upstash Redis pub/sub for real-time message delivery. Implement a delayed refetch mechanism as a pragmatic fallback when SSE encounters CORS or connectivity issues.

Rationale

  • Simplicity. SSE is a native browser API (EventSource) that requires no additional client libraries. It handles reconnection automatically. For Curaway's use case (server-to-client streaming), SSE is the right level of abstraction.
  • One-way streaming fits the model. Chat responses flow from server to client. The client sends messages via regular HTTP POST. There is no need for the bidirectional channel that WebSocket provides.
  • Redis pub/sub for scale. Upstash Redis is used as the message broker. The backend pushes response chunks via RPUSH to a conversation-specific key, and the SSE endpoint reads via LPOP. This pattern works across multiple Railway container instances.
  • Pragmatic fallback. During development, SSE encountered intermittent CORS issues when Cloudflare proxy was enabled (see ADR-0008). A delayed refetch fallback (client polls for the completed response after a short delay) ensures the user always gets their answer, even if the real-time stream fails.

Alternatives Considered

Alternative Pros Cons Verdict
WebSocket Full-duplex, low latency Bidirectional capability is unnecessary overhead; more complex server setup; Railway WebSocket support requires specific configuration Rejected
HTTP long polling Simple to implement, works everywhere Wasteful (repeated connections), higher latency, poor UX for streaming responses Rejected
Firebase Realtime Database Managed, real-time out of the box Vendor lock-in to Google, another platform to manage, data residency concerns for healthcare Rejected
Mercure Standards-based (SSE + hub), built-in auth Additional infrastructure (hub server), small community, unfamiliar to the team Not evaluated

Consequences

  • Positive: Real-time streaming gives users immediate feedback as the agent generates a response. Perceived latency is dramatically lower than waiting for a complete response.
  • Positive: SSE's built-in reconnection handles intermittent network issues gracefully.
  • Positive: Upstash Redis is serverless with pay-per-request pricing, keeping costs proportional to usage.
  • Negative: SSE is HTTP/1.1-based and limited to ~6 concurrent connections per domain in some browsers. Not an issue at current scale (one conversation per tab).
  • Negative: The delayed refetch fallback means some users may experience a non-streaming response when SSE fails. This is acceptable as a degraded-but-functional experience.
  • Accepted risk: Redis message ordering depends on the RPUSH/LPOP pattern being consumed correctly. A consumer that disconnects mid-stream may miss chunks. The fallback refetch mitigates this.