Orchestrator Planner — Steer Document¶
Feature: Replace the orchestrator's hand-coded phase machine with an
LLM-driven planner (Layer 2 of the conversation flow remediation plan)
Version: 1.0
Date: April 2026
Author: Srikanth Donthi (CPO/CTO)
Status: Design — Not Yet Approved for Implementation
Depends on: gates_v2 (Layer 1) deployed and observed for at least
one full session of real traffic
1. Problem Statement¶
case_orchestrator.handle_message is a long branching function that gates
each turn on a flag soup: procedure_identified, intake_complete,
records_requested, quick_questions_asked, ehr_constructed,
medications_asked, min_info_for_matching, matching_complete,
providers_selected, consent_given, forwarded. Several branches
early-return after a single LLM call and flip flags as side effects, so the
next turn often lands in a completely different branch than the one the
previous turn finished in.
A natural follow-up patient utterance ("here's another doc", "I take
metformin", "what next?") frequently routes to the wrong sub-handler. The
agent's reply looks fine in isolation but the state machine doesn't
move forward, so matching is never triggered. Layer 1 (gates_v2,
PR #70) loosened the gates to mask the symptom, but the underlying
brittleness remains: every new flag added compounds the routing surface
area.
This is fundamentally a control flow problem, not a prompt problem. It cannot be fully solved by tweaking thresholds — at some point the routing logic itself has to change.
2. Design Decision: One Planner Call Per Turn¶
Decision: Replace the if/elif tree in handle_message with a single
LLM call that, given the full case state, picks the next action. The
orchestrator becomes a thin dispatcher: read state → call planner → run
the chosen action handler → record the decision → return.
Rationale:
- One LLM call has the whole picture. The current branches each see only the slice of state they were written to look at.
- The planner is debuggable — Langfuse already records every LLM call. We get a trace of "given this state, the planner picked X because Y" for every turn. The current branching logic is invisible to Langfuse.
- Adding a new action (e.g., "request_clarification", "ask_for_imaging") becomes a one-line addition to the planner's enum + a new handler. No new branch in the orchestrator, no flag soup.
- The planner is one Haiku call (~$0.001/turn). At 1000 turns/day that is $1/day. Trivial cost.
Rejected alternatives:
- LangGraph state machine. Already considered in Session 11. The problem with LangGraph for this layer is that the state graph itself becomes the new flag soup. We'd be writing nodes and edges instead of if/elif branches. Same complexity, different syntax. LangGraph wins when the graph structure is mostly stable and the work inside each node is heavy. Our case is the opposite: light work per node, many possible transitions, frequent reshuffling.
- Rule-based router with priority order. Closer to the current design but with an explicit ordered list of rules. Cleaner than the if/elif tree but still requires hand-tuning every time a new conversation pattern emerges.
- Reinforcement-learning router. Gold-plated. Need labelled trajectories first. Defer to post-Series A.
3. The Planner Contract¶
3.1 Inputs¶
The planner receives a structured snapshot of the case state. It does
NOT see raw prior messages — those go through the existing _llm_generate
sub-handlers. The planner is for routing, not for generating
patient-facing text. Inputs:
PlannerInput = {
"case_id": str,
"procedure_identified": bool,
"procedure_name": str | None,
"intake_complete": bool,
"ehr_completeness": float, # 0.0 — 1.0
"has_documents": bool,
"documents_pending_processing": bool,
"blocking_issues": list[str], # from doc validator + risk_assessor
"missing_critical_info": list[str], # demographics, meds, allergies, etc.
"matching_complete": bool,
"providers_selected": bool,
"consent_given": bool,
"forwarded": bool,
"last_user_message_summary": str, # 1-line summary, NOT raw text
"last_user_intent": str | None, # from input classifier
"turn_number": int,
}
Building this dict is mechanical — it's already what patient_state.py
produces. Layer 2 just plumbs it into the planner call.
3.2 Outputs¶
PlannerOutput = {
"next_action": Literal[
"identify_procedure",
"request_records",
"process_uploaded_documents",
"collect_intake_info",
"advance_to_matching",
"show_matches",
"request_provider_selection",
"request_consent",
"forward_records",
"answer_question",
"handle_blocking_issue",
"celebrate_journey_complete",
],
"reason": str, # 1-2 sentence rationale (logged, not shown)
"missing_for_advance": list[str], # what's still needed if not advancing
"confidence": float, # 0.0 — 1.0
}
3.3 Action Handlers¶
Every action in the enum maps to exactly one existing function in
case_orchestrator.py. The current functions are reused as-is — only
the dispatching logic changes:
| Action | Existing handler |
|---|---|
identify_procedure |
_handle_procedure_identification |
request_records |
LLM call with records_first phase |
process_uploaded_documents |
_handle_attachment_response |
collect_intake_info |
_handle_intake |
advance_to_matching |
_handle_matching |
show_matches |
_handle_matching (returns existing matches) |
request_provider_selection |
hardcoded prompt (existing block) |
request_consent |
_handle_consent |
forward_records |
_handle_forwarding |
answer_question |
LLM call with general phase |
handle_blocking_issue |
LLM call with phase prompt that names the issue |
celebrate_journey_complete |
hardcoded final message (existing block) |
No new handlers in Layer 2. Adding new handlers is strictly out of scope — Layer 2 is about routing, not about new capabilities.
3.4 Fallback¶
If the planner LLM call fails, errors out, returns an unparseable
response, or returns an action not in the enum, the orchestrator falls
back to the existing if/elif logic. The legacy code stays in place,
gated behind a planner_v1_enabled feature flag (default false until
we observe the planner in shadow mode).
4. Shadow Mode Before Cutover¶
Before the planner replaces the if/elif tree, it runs in shadow mode:
- Existing logic decides the action (control)
- Planner runs in parallel and records its decision (shadow)
- Both decisions written to a new
planner_shadow_decisionstable or the existing events table withevent_type=planner.shadow_decision - We compare control vs shadow over ~500 real turns
If shadow agreement is >90% on the dominant actions and the disagreements look reasonable on a Langfuse spot-check, flip the flag to make the planner authoritative. If not, we tune the planner prompt and re-shadow.
This is the same pattern we use for matching strategies (Section 12 in CLAUDE.md). It's the right safety net for any control-flow change.
5. Feature Flag¶
Flag name: planner_v1_enabled
Default: false
Rollout plan:
1. Ship the planner code + shadow mode wiring with the flag false
2. Manually flip a percentage rollout in Flagsmith (5% → 25% → 100%)
3. Hard cutover happens only after 95%+ shadow-mode agreement on a
week of traffic
6. Out of Scope (This Layer)¶
- New action types beyond the 12 listed above
- Multi-action sequencing in a single turn
- Replacing the sub-handlers themselves (those keep their current prompts and logic)
- Removing the legacy if/elif code (kept as the fallback)
- Frontend changes (planner is server-side only)
7. Success Criteria¶
Within two weeks of the cutover (planner authoritative, not shadow):
- Zero increase in the per-turn 5xx rate or LLM failure rate
- >=95% routing agreement with the human-spot-checked "correct action" on a 100-turn sample
- >=70% of cases reach matching (Layer 1 target was 60%)
- No new categories of agent failures introduced (we'll review all planner.error events at end of week 1)
8. Cost & Latency¶
- One Haiku call per turn, ~200 input tokens / ~80 output tokens. ~$0.001 per turn.
- Latency ~400ms. Runs in parallel with the existing input classifier
(which is also a Haiku call), so the marginal latency is roughly zero
if we await both with
asyncio.gather. - Shadow mode doubles this for the rollout window. Acceptable.
9. Rollback¶
The flag planner_v1_enabled is the kill switch. Setting it to false
in Flagsmith reverts to the legacy if/elif tree within one cache TTL
(60s). No code redeploy required.