Skip to content

SSE Upload Progress Streaming

Status: Implemented — emit_progress + ProgressStep live Author: SD Created: 2026-04-03 Branch: feat/sse-upload-progress


What

Replace the static "Preparing response" spinner during file upload and extraction with a real-time, granular progress stream. The user sees each processing step as it completes, giving them confidence the system is actively working — not hanging.

The backend SSE endpoint (/documents/stream) already exists. This spec covers wiring it to emit step-level events and building the frontend to consume and display them.

Why

  • Patient experience: Real patients uploading real medical reports need reassurance during processing. Silence during a 10-30s wait erodes trust — especially when dealing with sensitive health documents. Named steps ("Analyzing findings") signal competence and care.
  • Investor demo: The live workflow uses real patient data. Visible sub-steps let investors see the Clinical Context agent's pipeline depth (reports → ICD codes → FHIR) working on actual documents without requiring technical explanation.
  • Production readiness: SSE streaming through the agent pipeline validates that the architecture supports real-time observability — a requirement for production, not a nice-to-have.

Agents Involved

Agent Role in this flow
Orchestrator Routes the uploaded document to Clinical Context
Clinical Context Performs the multi-step extraction pipeline. Primary emitter of progress events.

Sub-Steps to Surface

These are the user-visible labels mapped to backend processing stages:

Order User-facing label Backend stage Emitted when
1 Uploading document File received by server Multipart upload completes
2 Extracting content Text/OCR extraction from PDF/image Extraction function returns
3 Analyzing findings Clinical NLP — condition identification Clinical Context begins ICD mapping
4 Building health record ICD codes → FHIR resource construction FHIR bundle assembled
5 Composing summary LLM summarization of structured data Summary generation starts
6 Ready All processing complete Final state pushed

Design constraint: Steps must appear in this order. If a step completes faster than ~300ms, the frontend should still flash it briefly (minimum visible duration) so the user perceives progress, not a jump.


Tasks

Backend

  • [ ] B1: Define SSE event schema
  • Event type: progress
  • Payload: { step: number, label: string, status: "in_progress" | "complete" | "error", timestamp: string, detail?: string }
  • Assertion: GET /documents/stream with a valid upload returns events matching this schema. Validate with a JSON schema check against 3 sample uploads.

  • [ ] B2: Emit progress events from Clinical Context agent

  • Add event emission at each stage boundary in the Clinical Context pipeline.
  • Each stage fires two events: in_progress when entering, complete when exiting.
  • Assertion: Langfuse trace for a live patient upload shows 6 progress events in order, with timestamps monotonically increasing.

  • [ ] B3: Error propagation

  • If any stage fails, emit a progress event with status: "error" and a user-safe detail message.
  • The stream must NOT hang on failure — always terminate with either step 6 (Ready) or an error event.
  • Assertion: Upload a malformed PDF → receive an error event within 5s → frontend shows actionable error, not infinite spinner.

  • [ ] B4: Heartbeat / keepalive

  • If no progress event fires for >5s, emit a heartbeat event to keep the SSE connection alive and signal to the frontend that the backend hasn't crashed.
  • Assertion: Simulate a slow extraction (>8s). Connection stays open, heartbeat received.

Frontend

  • [ ] F1: SSE client connection
  • Connect to /documents/stream via EventSource (or fetch-based SSE for better error handling) immediately after upload begins.
  • Handle reconnection on drop (max 3 retries, then show error state).
  • Assertion: Kill backend mid-stream → frontend retries → shows graceful error after 3 attempts.

  • [ ] F2: Progress UI component

  • Render a vertical step indicator (not a horizontal progress bar) showing all 6 steps.
  • Current step: animated/pulsing indicator.
  • Completed steps: checkmark + muted style.
  • Upcoming steps: greyed out.
  • Minimum visible duration per step: 300ms (even if backend completes faster).
  • Assertion: Upload a small document that processes in <2s → all 6 steps still visually appear in sequence, no "jump to done."

  • [ ] F3: Error state rendering

  • On receiving an error event, stop the progress indicator at the failed step.
  • Show the detail message from the error event.
  • Offer a "Try again" action that resets and re-uploads.
  • Assertion: Malformed PDF upload → progress stops at "Extracting content" → error message shown → "Try again" re-initiates upload.

  • [ ] F4: Completion transition

  • When step 6 (Ready) is received, transition from the progress view to the results view.
  • Brief pause (~500ms) on "Ready" before transitioning so the user registers completion.
  • Assertion: Successful upload → "Ready" shows with checkmark for 500ms → results view renders with extracted data.

Out of Scope

  • Progress for non-document interactions (chat, search). Those flows don't have the latency problem.
  • Percentage-based progress (0-100%). Step-based is more honest — we don't know true percentages for LLM steps.
  • WebSocket migration. SSE is sufficient for server-to-client unidirectional updates.

Privacy & Data Handling

Since this flow processes real patient medical documents:

  • SSE event payloads must NEVER include patient-identifiable information (no names, DOB, diagnosis text in detail fields). Labels are generic processing stages only.
  • Error events must not leak document content, filenames containing patient info, or internal extraction output.
  • Langfuse traces may contain PHI — ensure trace access is restricted per existing access controls.

Verification Checklist (Investor Demo)

Live workflow with real patient medical report upload:

  • [ ] Upload triggers visible step-by-step progress
  • [ ] Each step label appears in order with no skips
  • [ ] SSE event payloads contain zero patient-identifiable data
  • [ ] Total flow completes in <30s
  • [ ] Langfuse trace shows corresponding progress events aligned to agent execution spans
  • [ ] Error case (bad file) shows graceful failure, not a hang — no PHI in error messages
  • [ ] Investor sees the pipeline working end-to-end on real data without technical explanation