Conversation Flow Gates — Steer Document¶

Feature: Lower-the-gates conversation flow fix (Layer 1 of the broader conversation flow remediation plan) Version: 1.0 Date: April 2026 Author: Srikanth Donthi (CPO/CTO) Status: Implemented — PR #70, gates_v2 flag live

1. Problem Statement¶

In production (https://app.curaway.ai), most conversations break before reaching the matching stage. The agent acknowledges what the patient said, then either re-asks the same questions on the next turn or fails to advance the workflow at all. After a document upload, the agent often does not probe further. The flow is not smooth — and matching is rarely triggered.

This document covers the first layer of a three-layer remediation plan that was scoped in the conversation thread that produced this doc. The other two layers (orchestrator planner refactor, synchronous chat extractor) will be addressed in follow-up sessions.

2. Root-Cause Analysis¶

We identified four overlapping failure modes in the current orchestrator and gating configuration:

2.1 Phase machine has too many escape hatches¶

case_orchestrator.handle_message is a long branching function that gates each turn on a flag soup: procedure_identified, intake_complete, records_requested, quick_questions_asked, ehr_constructed, medications_asked, min_info_for_matching, matching_complete, providers_selected, consent_given, forwarded. Several branches early-return after a single LLM call and flip flags as a side effect, so the next turn may land in a completely different branch. A natural follow-up patient utterance ("here's another doc", "I take metformin") frequently routes to the wrong sub-handler and the state machine stops advancing — even though each individual reply looks fine in isolation.

2.2 Intake completion gate is an AND of six conditions¶

The intake phase prompt (app/agents/llm_conversation.py:461-471) marks phase_complete=true ONLY when all of:

age + gender confirmed
location confirmed
medications confirmed (or explicit "none")
allergies confirmed (or explicit "none")
basic medical history collected
at least one preference captured

In real conversation, at least one of these almost never gets cleanly set:

The patient never explicitly says "I have no allergies" — they keep talking, so allergies_confirmed_none is never written to metadata.
Demographics often arrive via uploaded documents, not chat, so the agent re-asks them.
"At least one preference" is satisfied loosely, but the chat extractor doesn't always pull it.

phase_complete is rarely true → matching is never triggered → the agent loops asking for the same things in slightly different forms.

2.3 Chat extractor runs after the routing decision¶

The chat extractor (Session 30) was added precisely to catch what the main LLM missed. But Session 31 moved it to the deferred / async lane via the enable_deferred_extraction flag. So the orchestrator's next-message routing decision uses case state from N turns ago — not the state that includes what the patient just said.

The patient says "I take metformin and lisinopril" → response goes out → extractor catches them 2-3 seconds later → but the next turn's routing already happened against the stale state. The agent re-asks for medications.

2.4 No "I'm ready, advance me" affordance¶

The patient has no way to say "stop probing, look at what we have, find providers." The matching gate is purely server-side (completeness_for_matching = 0.5 or all intake fields confirmed). The gate is too high — patients give up before they hit it.

3. Design Decision: Lower the Gates Surgically¶

Decision: Adjust gating thresholds and intake completion criteria so the state machine advances on realistic real-world signals, and add an explicit "advance now" signal patients can use. Do NOT refactor the orchestrator phase machine in this layer (that's Layer 2).

Rationale:

The phase machine works most of the time when the gates fire. The problem is the gates almost never fire. Lowering the gates is a 1-line change per gate; refactoring the orchestrator is 2-3 days.
Layer 1 is mostly removing code (or changing thresholds). Lowest risk, highest impact.
We can ship Layer 1 alone, observe the result for a session, then decide whether the planner refactor (Layer 2) is still warranted.

Rejected alternatives:

Refactor the orchestrator into a planner-driven loop (Layer 2). Correct long-term, but high risk and high effort. Ship Layer 1 first.
Make the chat extractor synchronous (Layer 3). Half-day fix, worth doing — but it doesn't help if the gate the extractor unlocks is too high anyway. Layer 1 has to come first.
Remove the gates entirely. Tempting but unsafe. Some minimum context is required before it makes sense to call the matching engine. Threshold-based gating is the right pattern — just at the right level.

4. New Gating Rules¶

4.1 Intake completion criteria — five things, with explicit-skip¶

Intake is complete when all of the following are true:

Procedure identified (case.procedure_code is set)
Either at least one document analyzed (analyzed_docs > 0) OR at least one round of intake Q/A completed (min_answers_for_completion substantive answers, currently 3)
Demographics: age and country (NOT city, NOT gender, NOT name — those are nice-to-have, not blocking)
Medications: confirmed (any list) OR medications_confirmed_none
Allergies: confirmed (any list) OR allergies_confirmed_none

That is the entire gate. Six conditions become five, and "explicit skip" is now first-class. The system prompt for the intake phase will be updated to set medications_confirmed_none=true / allergies_confirmed_none=true whenever the patient indicates they have none, AND a new confirm_no_remaining_intake orchestrator action will let the patient explicitly skip.

4.2 Matching gate — drop completeness to 0.4, derive from intake state¶

The current matching gate is:

gating:
  ehr:
    completeness_for_matching: 0.5   # main gate
    completeness_with_meds: 0.4      # relaxed when meds asked
    matching_ready_threshold: 0.4    # used by patient_state

Three thresholds for the same concept is two too many. New shape:

gating:
  ehr:
    # Single matching gate. Match when EHR is at least this complete.
    # 0.4 == "we know enough to start showing options".
    completeness_for_matching: 0.4

    # Backwards-compat alias (same value) — both keys read the same number.
    completeness_with_meds: 0.4
    matching_ready_threshold: 0.4

Plus a new gate:

  matching:
    # Whether medications must be asked/confirmed before matching proceeds
    require_medications: true

    # NEW: when intake_complete is true, ALWAYS allow matching even if
    # the completeness score is below the threshold. Intake is the
    # authoritative signal that we have enough context.
    allow_when_intake_complete: true

4.3 Patient explicit-advance affordance¶

Add a one-shot orchestrator action: when the patient says any of ["find providers now", "show me providers", "i'm ready", "skip the rest", "proceed", "advance"] (word-boundary matched), the orchestrator:

Sets medications_confirmed_none=true if no meds captured
Sets allergies_confirmed_none=true if no allergies captured
Sets intake_complete=true
Routes directly to _handle_matching

This is a patient-driven gate override, not a server-side relaxation. It always requires the patient to explicitly ask. Logged as a decision_record event so we can see how often it's used in Langfuse / events table.

4.4 Records-first early termination¶

When the patient says "I have no records" / "skip" / "let's proceed" while in the records-first phase, the orchestrator currently moves to _handle_intake which runs the quick-questions template. That's correct. But the quick-questions template currently sets quick_questions_asked=true regardless of whether the patient actually answered. New rule: only set quick_questions_asked=true after a follow-up turn that contains substantive answers. This avoids the "agent asked 3 things, patient ignored them, gate flipped anyway" loop.

5. Data Contract¶

No new database columns. All gating reads/writes go through:

config/guardrails.yaml (gating.* section) — single source of truth
app/services/gating_config.py — loads YAML with defaults
case.workflow_state JSONB (existing) — stores intake_complete, medications_asked, etc.
case.extra_metadata JSONB (existing) — stores medications_confirmed_none, allergies_confirmed_none

6. Feature Flag¶

Flag name: gates_v2 Default: true (enabled for everyone — this is a bug fix) Behavior when disabled: Falls back to the prior gating thresholds (completeness_for_matching: 0.5, completeness_with_meds: 0.4, six-AND intake gate). Controlled via Flagsmith for instant rollback if the new gates cause an unexpected regression.

7. Analytics¶

Add to the existing events table via decision_recorder:

Event	When	Properties
`gates_v2_intake_complete`	When intake_complete flips to true under v2 rules	`case_id`, `met_criteria` (list), `had_explicit_skip` (bool)
`gates_v2_matching_advanced`	When matching is triggered	`case_id`, `completeness`, `intake_complete`, `triggered_by` (gate/explicit)
`gates_v2_explicit_advance`	When patient uses the explicit advance phrase	`case_id`, `phrase_matched`

These let us measure whether the new gates are firing more often and whether the explicit-advance affordance is actually used.

8. Out of Scope (This Layer)¶

Orchestrator planner refactor (Layer 2 — separate session)
Synchronous chat extractor (Layer 3 — separate session)
Frontend "Skip remaining intake" button (the explicit-advance phrase catches the same intent in chat for now; the button is a future polish)
Removing the records-first phase entirely (reasonable but bigger scope)
Multi-procedure case handling

9. Success Criteria¶

Within one week of deploy, measured via Langfuse + events table:

>60% of cases reach the matching stage within the same session (today: rough estimate ~25-35% based on user report).
Intake completion within 5 patient messages for cases that have at least one document upload (today: often 8-12 or never).
Zero increase in matching engine errors or low-quality matches — we're loosening the gate, not the matching logic itself.
Zero increase in voice/EQ rule violations (the new affordance doesn't change how the agent talks).

10. Rollback¶

Flag gates_v2 controls the entire change. Flip to false in Flagsmith and the orchestrator reverts to the prior thresholds within one cache TTL (60s). No code redeploy required.