Skip to content

04 — Agent Pipeline Design

The agentic layer is the core differentiator. Without it: CRUD app with scoring formula. With it: AI reads medical reports, understands diagnoses, explains treatment options in the patient's language.

Framework Stack

Framework Role When
LangGraph Agent orchestration — multi-agent StateGraph with typed state Every patient request
LangChain Tool wrappers for LLM/DB/API calls Called by LangGraph nodes
Langfuse Production observability — traces, cost, latency + prompt management Every LLM call

Agent Orchestrator

Endpoint: POST /api/v1/patients/{id}/chat

Single unified interface the frontend calls. Top-level LangGraph StateGraph that classifies intent and routes.

Patient Intent Routed To Example
Medical report submission Clinical Context Agent "I have a knee X-ray report"
Treatment search Match Agent "Find me the best hospital"
Question about results Explanation Agent "Why was Apollo recommended?"
Getting started / general Intake Agent "I need help getting started"
Document upload notification Document handler (attachment metadata in request)

Orchestrator State Schema

class OrchestratorState(TypedDict):
    patient_id: str
    tenant_id: str
    message: str
    conversation_history: list[dict]
    intent: str                      # classified intent
    attachments: list[dict]          # uploaded document references
    case_status: str                 # current workflow phase
    workflow_state: dict             # phase-specific state
    response: str                    # agent response to return
    suggested_actions: list[dict]    # UI action chips
    agent_name: str                  # which agent handled this

Agent Specifications

Clinical Context Agent

Purpose: Raw medical report text → validated FHIR R4 resources with ICD-10/SNOMED coding.

LangGraph Nodes: 1. extract_clinical_entities — Claude Haiku parses raw text → structured conditions, procedures, medications, allergies 2. map_to_medical_codes — Claude Haiku maps each entity to ICD-10/SNOMED with confidence scores 3. generate_fhir_resources — Generates valid FHIR R4 JSON (Condition, Procedure, AllergyIntolerance, Observation) 4. store_resources — Calls fhir_service.create_fhir_resource() with full R4 schema validation

State: { patient_id, tenant_id, raw_text, report_type, extracted_entities[], coded_entities[], fhir_resources[], stored_resource_ids[], errors[] }

Model: Claude Haiku 4.5 (~$0.01/report). Fallback: GPT-4o mini.

Prompt: System prompt with 2–3 few-shot examples of real radiology reports mapped to ICD codes. Output schema enforced via structured JSON. Managed in Langfuse.

Fallback: 202 Accepted, raw text stored, extraction queued for QStash retry. Patient never blocked.

Intake Agent

Purpose: Conversational patient onboarding. Records-first: extracts from documents before asking questions.

LangGraph Nodes: 1. classify_intent — Routes patient message to appropriate handler 2. collect_information — Extracts structured data from conversational input 3. suggest_actions — Recommends next steps (upload X-ray, provide insurance, grant consent) 4. update_progress — Advances intake status, updates patient profile

State: { patient_id, tenant_id, message, conversation_history[], intent, extracted_data{}, suggested_actions[], intake_progress: float }

Model: Claude Haiku 4.5. State in events table (not in-memory). Fallback: Standard form-based intake.

Match Agent

Purpose: Wraps deterministic matching engine with AI pre/post-processing.

LangGraph Nodes: 1. analyze_clinical_picture — Claude Sonnet reviews all FHIR resources, generates clinical summary with risk factors 2. determine_requirements — Identifies needed specialties, procedures, accommodations 3. run_weighted_scoring — Calls existing WeightedScoringV1 (or strategy from Flagsmith) 4. rerank_edge_cases — Claude reviews top 5 for comorbidity risks, contraindications 5. generate_explanations — Passes to Explanation Agent

Feature flag: agent_enhanced_matching. When disabled, steps 1,2,4,5 skipped. Zero regression.

Models: Claude Sonnet 4.6 for clinical analysis. Claude Haiku 4.5 for re-ranking.

Explanation Agent

Purpose: Natural language match reasoning in patient's preferred locale.

Model: Claude Haiku 4.5. Supports multilingual output via patient.preferred_locale. Fallback: Template-based string explanations (pre-agent behavior).

Example output for Aisha (Arabic locale):

Based on your knee osteoarthritis diagnosis (ICD M17.11), Apollo Hospitals Chennai is the strongest match. Their orthopedic department has performed over 3,000 knee replacements with a 95% success rate. The hospital supports Arabic-speaking staff and halal dietary options. At approximately $6,000–$8,000 USD, they offer competitive pricing in the India corridor.

Three-Layer Guardrails

  1. Langfuse-managed system prompts — externalized, versioned, A/B testable. Define agent boundaries (DO / DON'T / REDIRECT framework).
  2. GPT-4o-mini input classifier — lightweight pre-filter classifying messages for safety, relevance, routing before reaching primary agent.
  3. Regex output validation — post-processing catches PII leakage, hallucinated medical advice, format violations.

All rules externalized to config/guardrails.yaml.

Guardrail Categories

guardrails:
  blocked_intents:
    - medical_advice          # "Should I take ibuprofen?"
    - diagnosis_speculation   # "That sounds like arthritis"
    - off_topic              # "What's the weather in Bangkok?"
    - outcome_prediction     # "You'll probably be fine"
  redirect_intents:
    - emergency              # → "Please call your local emergency services"
    - existing_treatment     # → "Please consult your current physician"
  allowed_intents:
    - intake_information
    - document_upload
    - match_query
    - explanation_request
    - preference_update
    - procedure_question     # Factual: "What is TKR recovery time?"

Fallback Philosophy

Every agent has a deterministic fallback. Platform never broken by LLM failure.

Agent Fallback
Clinical Context Store raw text, queue QStash retry. Patient not blocked.
Intake Standard form-based intake (REST endpoints).
Match Pure WeightedScoringV1 without AI enrichment.
Explanation Template-based string explanations.
Orchestrator Direct API calls to individual endpoints.

Prompt Strategy

  • All system prompts in app/agents/prompts/ as versioned Python modules
  • Every prompt includes 2–3 few-shot examples of real medical data
  • Output format enforced via JSON schema instructions
  • Prompt versions managed via Langfuse prompt management
  • Regression testing deferred to post-MVP (needs eval datasets)

Agent Observability

Layer Tool Tracks
Request-level Events table agent_name, model, tokens, latency, cost, success/failure, correlation_id
Trace-level Langfuse Full traces, nested spans, prompt/completion pairs, cost per journey
Eval-level LangSmith (post-MVP) Offline evals when enough data exists