Skip to content

ADR-0010: Synchronous OCR before Orchestration

Status: Accepted Date: 2026-03-25 Session: 23B

Context

A critical bug was reported: patients would upload a medical document in the chat, but the agent would respond with "I don't see any attachment" or analyze the message without referencing the document's content. The root cause was a race condition in the document processing pipeline.

The original architecture used QStash (a serverless message queue) to handle OCR asynchronously:

  1. Patient uploads a PDF via the chat endpoint.
  2. The chat endpoint enqueues an OCR job to QStash and immediately passes the message to the orchestrator agent.
  3. QStash triggers the OCR worker, which extracts text and stores it in the database.
  4. The orchestrator, meanwhile, tries to read the document text -- but the OCR job has not completed yet.

The result: the orchestrator sees a document record with no extracted text, and proceeds as if no document was uploaded.

Decision

Run PyMuPDF text extraction synchronously inline within the chat endpoint, before the message is passed to the orchestrator agent. Remove QStash from the OCR pipeline for chat-attached documents.

Rationale

  • Eliminates the race condition. By extracting text synchronously before the orchestrator runs, the document text is guaranteed to be available when the agent processes the message. This is the direct fix for the reported bug.
  • PyMuPDF is fast enough. Benchmarking showed that PyMuPDF extracts text from a typical medical PDF (5-10 pages) in under 1 second. This latency is imperceptible to the user, especially since the orchestrator itself takes 2-5 seconds to generate a response.
  • Simpler architecture. Removing QStash from the critical path eliminates a moving part: no queue, no worker, no retry logic, no "is it done yet?" polling. The document processing pipeline becomes a straightforward function call.
  • Consistency with ADR-0003. This decision reinforces the PyMuPDF-first strategy established in ADR-0003 and extends it to the specific context of chat-attached documents.

Alternatives Considered

Alternative Pros Cons Verdict
Wait for async OCR to complete Keeps async architecture Adds polling complexity, increases latency (must wait for QStash roundtrip + OCR), fragile timeout logic Rejected
Pre-extract at upload confirm time Text ready before chat message is sent Adds latency to the upload step (user waits for extraction before they can type), complicates the upload flow Rejected
Optimistic orchestration with retry Orchestrator retries if text is missing Complex retry logic, user sees "processing..." spinner, inconsistent agent behavior Rejected
QStash with callback QStash notifies the API when OCR is done, then orchestrator runs Still asynchronous, adds callback complexity, user waits longer Rejected

Consequences

  • Positive: The "agent can't see attachment" bug is fully resolved. Document text is always available when the orchestrator processes a message.
  • Positive: Simpler architecture with one fewer infrastructure dependency (QStash removed from this path).
  • Positive: Predictable latency: extraction time is added to the request, but it is consistent and fast (< 1s).
  • Negative: The chat endpoint becomes slightly slower for messages with attachments (adds < 1s of synchronous extraction). This is acceptable given the orchestrator's own processing time.
  • Negative: For very large PDFs (50+ pages) or scanned documents requiring the OCR fallback chain, synchronous extraction could take several seconds. A future optimization could set a timeout and fall back to async processing for outliers.
  • Negative: QStash is still used for other background tasks (email notifications, analytics events). It is not fully removed from the system, just from the OCR critical path.
  • Accepted risk: Synchronous extraction in the request path means a PyMuPDF crash or hang would block the chat response. Mitigated by a 10-second timeout on the extraction call.