Conversation Flow Gates — Steer Document¶
Feature: Lower-the-gates conversation flow fix (Layer 1 of the broader conversation flow remediation plan) Version: 1.0 Date: April 2026 Author: Srikanth Donthi (CPO/CTO) Status: Implemented — PR #70, gates_v2 flag live
1. Problem Statement¶
In production (https://app.curaway.ai), most conversations break before
reaching the matching stage. The agent acknowledges what the patient said,
then either re-asks the same questions on the next turn or fails to advance
the workflow at all. After a document upload, the agent often does not probe
further. The flow is not smooth — and matching is rarely triggered.
This document covers the first layer of a three-layer remediation plan that was scoped in the conversation thread that produced this doc. The other two layers (orchestrator planner refactor, synchronous chat extractor) will be addressed in follow-up sessions.
2. Root-Cause Analysis¶
We identified four overlapping failure modes in the current orchestrator and gating configuration:
2.1 Phase machine has too many escape hatches¶
case_orchestrator.handle_message is a long branching function that gates
each turn on a flag soup: procedure_identified, intake_complete,
records_requested, quick_questions_asked, ehr_constructed,
medications_asked, min_info_for_matching, matching_complete,
providers_selected, consent_given, forwarded. Several branches
early-return after a single LLM call and flip flags as a side effect, so the
next turn may land in a completely different branch. A natural follow-up
patient utterance ("here's another doc", "I take metformin") frequently
routes to the wrong sub-handler and the state machine stops advancing — even
though each individual reply looks fine in isolation.
2.2 Intake completion gate is an AND of six conditions¶
The intake phase prompt
(app/agents/llm_conversation.py:461-471) marks phase_complete=true
ONLY when all of:
- age + gender confirmed
- location confirmed
- medications confirmed (or explicit "none")
- allergies confirmed (or explicit "none")
- basic medical history collected
- at least one preference captured
In real conversation, at least one of these almost never gets cleanly set:
- The patient never explicitly says "I have no allergies" — they keep
talking, so
allergies_confirmed_noneis never written to metadata. - Demographics often arrive via uploaded documents, not chat, so the agent re-asks them.
- "At least one preference" is satisfied loosely, but the chat extractor doesn't always pull it.
phase_complete is rarely true → matching is never triggered → the agent
loops asking for the same things in slightly different forms.
2.3 Chat extractor runs after the routing decision¶
The chat extractor (Session 30) was added precisely to catch what the main
LLM missed. But Session 31 moved it to the deferred / async lane via the
enable_deferred_extraction flag. So the orchestrator's next-message
routing decision uses case state from N turns ago — not the state that
includes what the patient just said.
The patient says "I take metformin and lisinopril" → response goes out → extractor catches them 2-3 seconds later → but the next turn's routing already happened against the stale state. The agent re-asks for medications.
2.4 No "I'm ready, advance me" affordance¶
The patient has no way to say "stop probing, look at what we have, find
providers." The matching gate is purely server-side
(completeness_for_matching = 0.5 or all intake fields confirmed). The gate
is too high — patients give up before they hit it.
3. Design Decision: Lower the Gates Surgically¶
Decision: Adjust gating thresholds and intake completion criteria so the state machine advances on realistic real-world signals, and add an explicit "advance now" signal patients can use. Do NOT refactor the orchestrator phase machine in this layer (that's Layer 2).
Rationale:
- The phase machine works most of the time when the gates fire. The problem is the gates almost never fire. Lowering the gates is a 1-line change per gate; refactoring the orchestrator is 2-3 days.
- Layer 1 is mostly removing code (or changing thresholds). Lowest risk, highest impact.
- We can ship Layer 1 alone, observe the result for a session, then decide whether the planner refactor (Layer 2) is still warranted.
Rejected alternatives:
- Refactor the orchestrator into a planner-driven loop (Layer 2). Correct long-term, but high risk and high effort. Ship Layer 1 first.
- Make the chat extractor synchronous (Layer 3). Half-day fix, worth doing — but it doesn't help if the gate the extractor unlocks is too high anyway. Layer 1 has to come first.
- Remove the gates entirely. Tempting but unsafe. Some minimum context is required before it makes sense to call the matching engine. Threshold-based gating is the right pattern — just at the right level.
4. New Gating Rules¶
4.1 Intake completion criteria — five things, with explicit-skip¶
Intake is complete when all of the following are true:
- Procedure identified (
case.procedure_codeis set) - Either at least one document analyzed (
analyzed_docs > 0) OR at least one round of intake Q/A completed (min_answers_for_completionsubstantive answers, currently 3) - Demographics: age and country (NOT city, NOT gender, NOT name — those are nice-to-have, not blocking)
- Medications: confirmed (any list) OR
medications_confirmed_none - Allergies: confirmed (any list) OR
allergies_confirmed_none
That is the entire gate. Six conditions become five, and "explicit skip"
is now first-class. The system prompt for the intake phase will be
updated to set medications_confirmed_none=true /
allergies_confirmed_none=true whenever the patient indicates they have
none, AND a new confirm_no_remaining_intake orchestrator action will let
the patient explicitly skip.
4.2 Matching gate — drop completeness to 0.4, derive from intake state¶
The current matching gate is:
gating:
ehr:
completeness_for_matching: 0.5 # main gate
completeness_with_meds: 0.4 # relaxed when meds asked
matching_ready_threshold: 0.4 # used by patient_state
Three thresholds for the same concept is two too many. New shape:
gating:
ehr:
# Single matching gate. Match when EHR is at least this complete.
# 0.4 == "we know enough to start showing options".
completeness_for_matching: 0.4
# Backwards-compat alias (same value) — both keys read the same number.
completeness_with_meds: 0.4
matching_ready_threshold: 0.4
Plus a new gate:
matching:
# Whether medications must be asked/confirmed before matching proceeds
require_medications: true
# NEW: when intake_complete is true, ALWAYS allow matching even if
# the completeness score is below the threshold. Intake is the
# authoritative signal that we have enough context.
allow_when_intake_complete: true
4.3 Patient explicit-advance affordance¶
Add a one-shot orchestrator action: when the patient says any of
["find providers now", "show me providers", "i'm ready", "skip the rest",
"proceed", "advance"] (word-boundary matched), the orchestrator:
- Sets
medications_confirmed_none=trueif no meds captured - Sets
allergies_confirmed_none=trueif no allergies captured - Sets
intake_complete=true - Routes directly to
_handle_matching
This is a patient-driven gate override, not a server-side relaxation.
It always requires the patient to explicitly ask. Logged as a decision_record
event so we can see how often it's used in Langfuse / events table.
4.4 Records-first early termination¶
When the patient says "I have no records" / "skip" / "let's proceed" while
in the records-first phase, the orchestrator currently moves to
_handle_intake which runs the quick-questions template. That's correct.
But the quick-questions template currently sets quick_questions_asked=true
regardless of whether the patient actually answered. New rule: only set
quick_questions_asked=true after a follow-up turn that contains substantive
answers. This avoids the "agent asked 3 things, patient ignored them, gate
flipped anyway" loop.
5. Data Contract¶
No new database columns. All gating reads/writes go through:
config/guardrails.yaml(gating.*section) — single source of truthapp/services/gating_config.py— loads YAML with defaultscase.workflow_stateJSONB (existing) — stores intake_complete, medications_asked, etc.case.extra_metadataJSONB (existing) — storesmedications_confirmed_none,allergies_confirmed_none
6. Feature Flag¶
Flag name: gates_v2
Default: true (enabled for everyone — this is a bug fix)
Behavior when disabled: Falls back to the prior gating thresholds
(completeness_for_matching: 0.5, completeness_with_meds: 0.4, six-AND
intake gate). Controlled via Flagsmith for instant rollback if the new gates
cause an unexpected regression.
7. Analytics¶
Add to the existing events table via decision_recorder:
| Event | When | Properties |
|---|---|---|
gates_v2_intake_complete |
When intake_complete flips to true under v2 rules | case_id, met_criteria (list), had_explicit_skip (bool) |
gates_v2_matching_advanced |
When matching is triggered | case_id, completeness, intake_complete, triggered_by (gate/explicit) |
gates_v2_explicit_advance |
When patient uses the explicit advance phrase | case_id, phrase_matched |
These let us measure whether the new gates are firing more often and whether the explicit-advance affordance is actually used.
8. Out of Scope (This Layer)¶
- Orchestrator planner refactor (Layer 2 — separate session)
- Synchronous chat extractor (Layer 3 — separate session)
- Frontend "Skip remaining intake" button (the explicit-advance phrase catches the same intent in chat for now; the button is a future polish)
- Removing the records-first phase entirely (reasonable but bigger scope)
- Multi-procedure case handling
9. Success Criteria¶
Within one week of deploy, measured via Langfuse + events table:
- >60% of cases reach the matching stage within the same session (today: rough estimate ~25-35% based on user report).
- Intake completion within 5 patient messages for cases that have at least one document upload (today: often 8-12 or never).
- Zero increase in matching engine errors or low-quality matches — we're loosening the gate, not the matching logic itself.
- Zero increase in voice/EQ rule violations (the new affordance doesn't change how the agent talks).
10. Rollback¶
Flag gates_v2 controls the entire change. Flip to false in Flagsmith
and the orchestrator reverts to the prior thresholds within one cache TTL
(60s). No code redeploy required.