v5 / v6 Conversation Prompt — Unified Plan¶

Author: 2026-05-15 (consolidation pass post Phase 1 completion) Authority: This doc supersedes scattered v5 / v6 tracking. When in doubt, refer back here. Owner: SD + agent platform. Status: Synthesis / audit only — no new architecture, no fresh ADRs, no implementation. Implementation runs from the per-phase plans (see §8). Companion specs (still authoritative on their own scope): - docs/specs/conversation-v6-feature.md (canonical v6 spec, rev 6, 962 lines) - docs/specs/conversation-v5-feature.md (legacy v5 spec — superseded but rules carry forward; 471 lines) - docs/specs/v6-rubric-locked.md (9-axis grader rubric) - docs/specs/v6-stage-resolver-truth-table.md (truth table for §2.6 resolver) - docs/specs/v6-stages-extractors-matrix.md (stage × extractor wiring) - docs/specs/v6-rule-location-map.md (lockstep registry for §8.5 CI gate) - docs/specs/v6-trio-consistency-findings.md (cross-doc audit, 18 findings)

0. Why this doc exists¶

SD asked for a single document because the v5 → v6 transition is scattered across:

One canonical v6 spec (conversation-v6-feature.md) plus 6 companion spec files
Two production prompts in tree (conversation_v4.yaml, conversation_v4.1.yaml)
One v6 base scaffold (conversation_v6.yaml) — all TODO(phase-2a) markers
Phase 0 (merged) + Phase 1 (open, stacked) PR chain
8 open / closed GitHub issues (#491, #547, #550, #560, #642, #743, #836, #837)
Session-memory notes in ~/.claude/projects/-Users-srikanthdonthi-Code-Curaway/memory/

The risk this doc mitigates: a v5 rule, a hot-fix paragraph, or a verbatim phrase silently disappears during the v6 absorption because no single owner knows which file holds the canonical version today. The audit identifies inheritance contracts so Phase 2a can land without drift.

Key framing: v4.1 + v1.1 + #837 hot-fix is the inheritance point, not v4 alone. v6 ABSORBS, it does not replace.

1. Production baseline (as of 2026-05-15)¶

What is actually running in production today (verified against config/feature_flags.yaml on main and the four hot-fix issues):

Surface	Value / Path	Source
`prompt_version` default	`"v4"`	`config/feature_flags.yaml:192`
`triage_layer_context_version` default	`"v1"`	`config/feature_flags.yaml:469`
Valid prompt pairs	`(v4, v1)` and `(v4.1, v1.1)`	flag description at `config/feature_flags.yaml:194`
Mixed pair fallback	`(v4, v1)` + Telegram alert	flag description at `config/feature_flags.yaml:194`
`mso_patient_offer_enabled` default	`false` (per-tenant flip after live testing)	`config/feature_flags.yaml:339-342`
`prompt_arch` default (open PR #924)	`"v4"` (no v5 fallback by design)	PR #924 diff vs main
`prompt_arch_v6_tenant_allowlist` default (open PR #924)	`"[]"` (empty list — zero v6 traffic)	PR #924 diff vs main
#837 hot-fix paragraphs location	`config/prompts/base/conversation_v4.yaml:111-112` ONLY	grep verified 2026-05-15
#837 hot-fix paragraphs in v4.1	NOT PRESENT (drift — see §6 risk register)	grep verified 2026-05-15
Active layer contexts	`config/prompts/layer_contexts/intent_capture.yaml` (v1) + `intent_capture_v1.1.yaml` (v1.1)	filesystem
Active phase contexts (fallback)	`config/prompts/phase_contexts/v2/*.yaml` (intake, records_first, identify_procedure, document_review, general, recovery_offer, recovery_checkin)	filesystem
Examples	`config/prompts/examples/{locale}/*.yaml`	unchanged from v4 era
v6 scaffold in tree	`config/prompts/base/conversation_v6.yaml`, `config/prompts/stages.yaml`, `config/prompts/knowledge/*.yaml` (all `TODO(phase-2a)`)	PR #925 (merged into stack base)
v6 dispatcher live?	YES (PR #927 in stack, dormant — `compose_v6` raises `NotImplementedError`)	`app/agents/v6_dispatcher.py`
Default conversation flow	v4 base + v2 phase context + v1 layer context	`app/agents/conversation_prompt.py:140` `get_system_prompt`
MSO addendum production location today	`config/prompts/base/conversation_v4.yaml:204` `mso_offer_addendum:` (in body) and v4.1 baked in	spec §3.5.1 row 9

Practical summary: Production traffic on 2026-05-15 runs conversation_v4.yaml with phase_contexts/v2/*.yaml + layer_contexts/intent_capture.yaml, gated by prompt_version=v4 + triage_layer_context_version=v1. Tenants with mso_patient_offer_enabled=true get (v4.1, v1.1). The #837 hot-fix paragraphs only live in v4.yaml lines 111-112 today; v4.1 has the original 5-bullet SAFETY block at lines 108-113 without the new treatment-recommendation + scope-rejection bans. This is a real drift — see §6 G-1.

2. Why v5 was folded into v6¶

Per docs/specs/conversation-v6-feature.md §0, two motivations forced the merge on 2026-05-12:

v5 work (#783) added 6 prompt rules to fix 7 logged P0/P1/P2 bugs (#491, #547, #550, #560, #642, #743, #546) but kept v4 architecture (phase × layer composition, two parallel injection taxonomies, addendum-as-third-mechanism).
A separate brainstorm surfaced that v4 architecture has structural drag: two overlapping concepts (phase / layer), three injection mechanisms (phase context / layer context / addendum), full base-prompt cloning per version, and ~35-40% redundant tokens per turn.

Doing both as one release (rules + architecture) avoids: two clinical advisor review cycles, two validation cycles (3 baselines + 3 after × 3 personas, twice), lockstep PRs during a v5-then-v6 transition window, and rule content being written twice (once into conversation_v5.yaml, once when restructuring into stages.yaml).

Concretely: there is no conversation_v5.yaml file, no v5 value for the prompt_version flag, no v5 row in prompt_loader.resolve_versions. The v5 spec at docs/specs/conversation-v5-feature.md is preserved as a rules reference, but its 6 rule additions land directly in conversation_v6.yaml per the §4 absorption mapping.

3. What's already implemented¶

Inventory by phase. Verified against PR list (gh pr list --state all) on 2026-05-15.

Phase 0 — Validation harness (MERGED)¶

PR	Title	Components shipped
#921	Phase 0 Steps 1 + 2 — locked rubric + LLM-grader scorer	`docs/specs/v6-rubric-locked.md`, `config/prompts/scorer/v6_compliance_scorer.yaml`, `app/services/prompt_compliance_scorer.py`, `tests/test_prompt_compliance_scorer.py` (17 deterministic tests, no LLM calls in CI), `scripts/test_compliance_scorer.py` (manual dogfood CLI)
#922	Phase 0 Step 3 — fixture corpus	15 conversations × 9-axis tagging in `tests/v6_fixtures/`; `tests/test_v6_fixture_corpus.py`
#923	Phase 0 Steps 4 + 5 + 6 — grader cache + 6 gates + CI	grader caching, determinism wrapper, cost guard; 6 deterministic gates (axes 1, 2, 6, 7, 8, flag YAML); `.github/workflows/v6-prompt-compliance.yml`

Live in production: ALL Phase 0 components are merged. None gate prod traffic — they gate PRs that touch prompt content.

Phase 1 — Safety net + scaffolding (OPEN — 6 stacked PRs)¶

PR	Title	Base branch	Status	Components shipped
#924	Step 1 — `prompt_arch` + tenant allowlist flags	`main`	OPEN	`config/feature_flags.yaml`: `prompt_arch=v4`, `prompt_arch_v6_tenant_allowlist="[]"`
#925	Scaffolding — stages.yaml + knowledge + stubs	`main`	OPEN	`conversation_v6.yaml` (94 lines, TODO markers), `stages.yaml` (117 lines, 12 stages with TODOs), `knowledge/{financial_options,post_travel_logistics,insurance_handling,procedure_clinical_facts/knee_replacement}.yaml`, `stage_resolver.py` (269 lines, §2.6 truth table implemented), `patient_context_builder.py` (155 lines, tenant assertion), `prompt_loader_v6.py` (62 lines, `compose_v6` raises NIE)
#926	Layer 2 — boot-time YAML artifact validator	`feat/v6-phase1-scaffolding`	OPEN	`app/services/v6_artifact_validator.py`, 14 tests
#927	Layer 3 — `prompt_arch` dispatcher + Langfuse tags	`feat/v6-phase1-yaml-validator`	OPEN	`app/agents/v6_dispatcher.py`, `get_system_prompt` branch, `_dispatch_tags` `contextvars.ContextVar` stash
#928	Layer 4 — fallback observability + cost-tracking scaffold	`feat/v6-phase1-dispatcher`	OPEN	`app/services/v6_fallback_monitor.py`, Telegram alert for 3 unexpected fallback reasons, `record_v6_turn_cost` stub
#929	Layer 5 — end-to-end safety-net smoke tests	`feat/v6-phase1-cost-monitoring`	OPEN	`tests/test_v6_phase1_safety_net_smoke.py` (8 scenarios)

Phases 2–9 — Content port + composer + production rollout (SHIPPED 2026-05-16)¶

Subagent-driven deployment campaign on top of the Phase 1 substrate. All 8 phases shipped in one session.

PR	Phase	Title	Components shipped
#933	Phase 1 (fixtures)	Absorption + verbatim + lockstep fixtures (G-3, G-4, G-5)	`tests/test_hotfix_837_absorption.py`, `tests/test_v5_rule_verbatim_preservation.py`, `tests/test_lockstep_consistency.py` (67 tests gating Phases 2-5 via `xfail(strict=True)` markers; markers cleared progressively)
#935	Phase 2	#837 backport into v4.1 + HARD BANS port into v6 (closes G-1)	`conversation_v4.1.yaml` SAFETY block + `conversation_v6.yaml` HARD BANS section (byte-identical to v4:111-112)
#937	Phase 3	Port v5 rules 2.1/2.3/2.5/2.6 + base sections	All `TODO(phase-2a)` markers cleared from `conversation_v6.yaml`: ROLE, COLLECT, VOICE+rule2.5 emotional-word list, NEVER, FORBIDDEN PHRASES, DOCUMENT-TRUST FRAMING, DEMOGRAPHIC GROUNDING, ONE QUESTION PER TURN (7-axis), REMEMBER
#938	Phase 4	stages.yaml content port (12 stages)	All `TODO(phase-2b)` markers cleared from `stages.yaml`. Recovery stages carry ADR-0018 §K escalation triggers + coordinator handoff flow
#939	Phase 5	Knowledge addendums (5 files)	CREATED `knowledge/mso_patient_offer.yaml` (relocated from v4.yaml:207-261). FILLED `financial_options`, `insurance_handling`, `post_travel_logistics`, `procedure_clinical_facts/knee_replacement`
#940	Phase 6+7	compose_v6 + 5 dataclass services (closes G-7)	`app/services/{case_summary,fhir_observation_summary,document_manifest,workflow_snapshot,patient_preferences}_service.py`, `knowledge_addendum_selector.py`, `prompt_loader_v6.py` real compose body, `patient_context_builder.py` real assembly. 194 tests added
#941	Phase 8	G-15..G-18 cleanups + Phase 6+7 polish	G-15 stage_resolver logging spec amendment, G-16 WorkflowState key remapping docstring, G-17 compose_v6 return-shape guard + alertable fallback reason, G-18 stage_resolver edge case test, knowledge selector lru_cache, cache_segments[0] prompt_version field, soft-return rationale docstrings on 4 services
#942	Phase 9	Production rollout — flip `prompt_arch=v6` + `allowlist=["*"]`	`config/feature_flags.yaml` defaults flipped. Internal-only prod (limited Curaway team); rollback by flipping default back to v4 (v4 path remains fully functional and is the dispatcher fallback target). Skipped: 24h dev test (per SD directive to enable fully on prod and test there)

Live in production (2026-05-16): v6 is the default architecture. prompt_arch_v6_tenant_allowlist=["*"] covers all internal tenants. Identity-aware overrides via Flagsmith remain available for per-tenant ramp-down if needed.

Async dispatcher conversion (deferred to follow-up): compose_v6 is async; dispatcher remains sync and catches unawaited coroutines via inspect.iscoroutine(). Conversion would touch >5 files outside the v6 chain (triage_agent.py + get_system_prompt callers). Tracked separately.

CI status (2026-05-16): All v6 prompt-compliance gates green except the LLM grader job which requires ANTHROPIC_API_KEY in GitHub Actions secrets (G-2 — pending SD action; non-blocking since deterministic gates cover axes 1, 2, 6, 7, 8).

4. v5 rules absorption checklist¶

Tabulating the 6 v5 rules per conversation-v6-feature.md §4 mapping table + v6-rule-location-map.md §2.27. The "Production today" column is verified against config/prompts/base/conversation_v4.yaml + v4.1.yaml on 2026-05-15.

Rule	Origin issue(s)	Target in v6	Production status today	Verbatim fixture present?
2.1 Document-trust framing	#560	`conversation_v6.yaml` DOCUMENT-TRUST FRAMING section	NOT in v4 baseline; NOT in v4.1. The closest pre-v6 hit is `phase_contexts/v2/document_review.yaml` lines 6-26 which has the "NO MEDICAL INTERPRETATION" block, but the v5-spec 4-part framing + identity-clarification language is NOT in production today.	NO. `tests/test_v5_rule_verbatim_preservation.py` does not exist yet. Verbatim phrase list defined in `v6-rule-location-map.md §3.1`.
2.2a Treatment-recommendation ban	#642 (+ #837 hot-fix)	`conversation_v6.yaml` HARD BANS section	In `conversation_v4.yaml:111` (added by #837 merged 2026-05-12). NOT in `conversation_v4.1.yaml` — see §6 G-1.	NO. `tests/test_hotfix_837_absorption.py` does not exist yet.
2.2b Scope-rejection ban	#743 (+ #837 hot-fix)	`conversation_v6.yaml` HARD BANS section	In `conversation_v4.yaml:112`. NOT in `conversation_v4.1.yaml` — see §6 G-1.	NO. Same fixture file as 2.2a (does not exist).
2.3 Unverified demographic claim	#547	`conversation_v6.yaml` DEMOGRAPHIC GROUNDING section (REVISED rev 3 — moved from stage-scope to BASE because demographic fabrications can fire in any stage)	NOT in v4 baseline; NOT in v4.1. Closest production guard is the existing voice rules.	NO. Verbatim phrase: `"The report I'm reading lists the patient as X — is this for someone other than yourself?"` — must land in fixture.
2.4 Records-upload re-offer	B1-v4 finding (no Github issue — surfaced by manual v4 conversation audit)	`stages.yaml > discovery.guidance` AND `procedure_identification.guidance` + `re_offer_on_turn: [2, 3]` field	Partial in v4 (records-first emphasis in `phase_contexts/v2/records_first.yaml`) but the turn-2-3 cadence guarantee is NOT enforced today. Lingering-discovery cases can miss the re-offer entirely.	NO. Fixture should be a 5+ turn `discovery` stagnation case asserting re-offer language on turns 2 AND 3.
2.5 Emotional verbatim echo	B1 axis-3 finding (no GH issue)	`conversation_v6.yaml` VOICE RULES section	Partial in `conversation_v4.yaml:40` "NAME THE SPECIFIC HARD THING" (and v4.1 same line). v5 Rule 2.7 strengthens this with an explicit emotional-word list. The 7-word list (`exhausted, scared, desperate, overwhelmed, frustrated, worried, tired`) is not in production.	NO. Verbatim word list per `v6-rule-location-map.md §3.4`.
2.6 Multi-question axis discipline	#491, #550	`conversation_v6.yaml` ONE QUESTION PER TURN + `stages.yaml` per-stage `do_not: [stack-questions]` redundant placement	Partial: `conversation_v4.yaml:38` has "ONE QUESTION ONLY when the patient is emotional" + `intent_capture.yaml` has pacing rules. The v5-spec "SAME-TURN AXIS DISCIPLINE" enumeration (Laterality / Mechanism / Severity / Timeline / Prior treatment / Demographics / Records availability) is NOT in production today.	NO. Verbatim axis list + WRONG/RIGHT example pair per `v6-rule-location-map.md §2.27` v5.RULE.006.

Inheritance starting point for Phase 2a:

Rules 2.1, 2.3, 2.5, 2.6 — start from conversation_v4.yaml (NOT v4.1, because v4.1 differs from v4 only by the MSO addendum being baked in; the bulk of voice / safety / process content is identical).
Rules 2.2a, 2.2b — start from conversation_v4.yaml:111-112 (the #837 hot-fix paragraphs) and verbatim-port to conversation_v6.yaml HARD BANS. The exact byte-identical text is non-negotiable (see §6 G-1).
MSO addendum — port from conversation_v4.yaml:204 mso_offer_addendum: (the in-body version baked into v4.1) to config/prompts/knowledge/mso_second_opinion.yaml. Gated by the SAME mso_patient_offer_enabled flag (spec §3.5.1 row 9). Phase 2b includes a regression test asserting flag value is honored across v4↔v6 toggle.

Key clarification: v4.1 is MSO-only; it's not a clinical-rules upgrade over v4. The "v4.1 / v1.1 pair" exists solely so tenants with mso_patient_offer_enabled=true get the MSO addendum without flag-conditional prompt assembly. Treating v4.1 as the inheritance point for clinical rules would be wrong — the clinical rules base is conversation_v4.yaml + #837 hot-fix paragraphs.

5. What's pending¶

Phase-by-phase per conversation-v6-feature.md §6 + reality on 2026-05-15:

Phase 2a — Migration: base prompt rules¶

Scope: Port v5 rules 2.1, 2.2 (verbatim from #837), 2.3, 2.5, 2.6 into conversation_v6.yaml base sections (replacing every TODO(phase-2a) marker).
Estimate: 1-2 days (Opus for content judgment, per conversation-v6-feature.md §6 row 2a).
Who: Opus author + Dr. Naidu reviewer.
Blockers:
tests/test_hotfix_837_absorption.py must land FIRST (see §6 G-3) — otherwise wording can drift during port without CI catching it.
tests/test_v5_rule_verbatim_preservation.py must land alongside the port (verbatim phrase fixtures per v6-rule-location-map.md §3).
tests/test_lockstep_consistency.py must land alongside the port (reads v6-rule-location-map.md, asserts every rule reaches its declared destination).
Dr. Naidu base-prompt-rules review gate (mandatory per spec §6 footnote — "All 4 windows MUST be locked on his calendar before Phase 0 starts" — confirm with SD whether this is locked).
LLM grader CI auth (G-2) — Phase 0 grader can't fail-close on prompt content if ANTHROPIC_API_KEY isn't wired.
Pre-flight check (per Phase 1 spec §3.5.1 row 8): #535 (Flagsmith identity bug) is CLOSED per gh issue view 535. Phase 1 unblock condition satisfied.

Phase 2b — Migration: stages.yaml content¶

Scope: Port phase + layer content into the 12 stages.yaml entries (replacing every TODO(phase-2b) marker — guidance, cards_to_use, advance_when, do_not, extractors_active). Lockstep — any voice-rule update to v6 also lands in v4.
Estimate: 2-3 days (Opus per spec §6 row 2b).
Who: Opus author + Dr. Naidu reviewer.
Blockers:
v6-stages-extractors-matrix.md must publish FIRST (per spec §6 row 2b — "blocking"). Status today: the matrix doc EXISTS at docs/specs/v6-stages-extractors-matrix.md (288 lines, draft 2026-05-12). Confirm Naidu has signed off on the matrix before Phase 2b starts, OR confirm it doesn't require his sign-off and only the stages.yaml content port does.
tests/test_extractor_prompts_pii_safe.py (CI gate per spec §3.4) must land — scaffolding scope ambiguous, may already be covered by Phase 0 gates or may be Phase 2b deliverable.
Dr. Naidu stages.yaml content gate (mandatory per spec §6 row 2b).
Phase 2a must merge first (Phase 2b depends on the base-prompt rule landing site).

Phase 3 — Admin UI extensions¶

Scope: prompt_arch selector in /admin/triage, stage debug endpoint at /api/v1/admin/cases/{case_id}/stage (with Depends(require_case_access)), knowledge addendum toggles.
Estimate: 1 day, Sonnet.
Blockers: Phase 2a + 2b merged (selector pointing at empty stages is useless).

Phase 4 — Frontend deep-link cards¶

Scope: RichCard.tsx extensions for view_payments, view_summary, view_consultations, stage_indicator; placeholder pages Payments.tsx, Summary.tsx, Consultations.tsx.
Estimate: 1 day, Sonnet.
Blockers: Phase 2b (stages declare cards_to_use).

Phase 5 — Extractor prompt updates¶

Scope: Replace "layer N" semantics in 5 extractor system prompts with stage-equivalent semantics (semantic-equivalent rewrite, not content change). 5 extractors: intent, medical, travel, logistics, financial. recovery_checkin_extractor (PR #832 / recovery_checkin_extractor.py) is downstream.
Estimate: 1-2 days, Opus for content (per spec §6 row 5).
Blockers: Phase 2b (stages.yaml extractors_active lists must be populated per the matrix); tests/test_extractor_layer_to_stage_rename.py (NEW per spec Appendix B) must accompany.

Phase 6 — Dual-shadow ramp 10%¶

Scope: Flip prompt_arch=v6 for 10% of tenants via prompt_arch_v6_tenant_allowlist. Side-by-side Langfuse trace comparison vs v4.
Estimate: 1 week observation calendar.
Acceptance criterion (new rev 5): Per-segment cache hit rate measured in Langfuse — Seg 2 ≥ 60%, Seg 3 ≥ 50% sustained 24h. Block ramp if either fails.
Blockers: Phases 2a-5 complete; cost dashboards green; Phase 6 acceptance criterion (cache hit rate) defined.

Phase 7 — Manual validation cycle¶

Scope: 3 baselines + 3 after on 3 personas (caregiver/oncology, direct/ortho, exploratory). 9-axis scoring per turn. SD + Dr. Naidu sign off per persona.
Estimate: 1 day live testing.
Blockers: Phase 6 observation complete; Naidu calendar (4th of 4 mandatory windows per spec §6).

Phase 8 — Ramp to 50% then 100%¶

Scope: Stagger; 24h hold between bumps.
Estimate: 3 days.
Blockers: Phase 7 sign-off; no regressions in Langfuse + Metabase dashboards.

Phase 9 — 2-week observation¶

Scope: Real-traffic per-case audit on a sample per persona.
Estimate: 2 weeks calendar.
Blockers: Zero clinical-safety violations during ramp.

Phase 10 — Decommission¶

Scope: Delete phase_contexts/, layer_contexts/, base prompts v1-v4, _LAYER_TO_PHASE mapping, deprecated loader functions, deprecated tests.
Estimate: 1-2 days (CORRECTED rev 5 from 0.5d — shadow-import audit on 8+ sites: tests/test_intake_fix5.py, tests/test_conversation_prompt.py, tests/test_prompt_loader.py, tests/test_no_medical_advice.py:PATIENT_FACING_FILES, app/agents/conversation_prompt.py:_get_phase_contexts() callsites, app/services/prompt_loader.py:PHASE_DIR/LAYER_DIR constants).
Blockers: Phase 9 complete; all v4 paths confirmed unused via Langfuse; re-export shims (§1.3) deleted.

Aggregate calendar (per spec §6): ~6-7 weeks from Phase 0 start to Phase 9 complete. Phase 0 + Phase 1 are done (~10 calendar days elapsed). Net remaining: ~4-5 weeks if Naidu calendar locks cleanly.

6. Gaps + risks¶

Items scattered across issues / specs / memory that aren't formally tracked in the phase plan. Each has an ID for cross-reference.

G-1 — `conversation_v4.1.yaml` is MISSING the #837 hot-fix paragraphs (CRITICAL — folded into Phase 2a)¶

Evidence: Grep on config/prompts/base/conversation_v4.1.yaml for "we don't handle that", "outside our scope", "right next step", "treatment recommendation" returns ZERO matches. The same grep on conversation_v4.yaml returns lines 111-112.
Impact: Tenants on mso_patient_offer_enabled=true (pair (v4.1, v1.1)) have the un-patched SAFETY block today. The two P0s (#642 treatment recommendation, #743 scope rejection) that #837 closed for v4-tenants are STILL OPEN for v4.1-tenants.
Resolution (per SD 2026-05-16): Folded into Phase 2a kickoff rather than treated as a separate pre-Phase-2a backport PR. Rationale: Phase 2a's first PR already lands the verbatim absorption fixture (test_hotfix_837_absorption.py — see G-3) and ports the #837 paragraphs into conversation_v6.yaml HARD BANS. Bundling the v4.1 backport into the same PR means a single Dr. Naidu review touchpoint covers both the v4.1 patch AND the v6 absorption byte-for-byte. Accepts ~1 week of un-patched v4.1 traffic in exchange for not splitting Naidu's attention across two paragraphs of identical text.
Phase 2a scope addendum: The first Phase 2a PR must:
Add tests/test_hotfix_837_absorption.py with byte-identical assertions against BOTH conversation_v4.1.yaml AND conversation_v6.yaml HARD BANS.
Patch conversation_v4.1.yaml to include the #837 treatment-recommendation + scope-rejection paragraphs verbatim from conversation_v4.yaml:111-112.
Port the same paragraphs into conversation_v6.yaml HARD BANS.
Single Dr. Naidu confirm covers all three (same paragraphs he already approved for v4).

G-2 — Phase 0 LLM-grader CI fails on PRs touching prompt content (HIGH)¶

Evidence: gh run list --workflow="v6 prompt compliance" shows 3 failure runs on feat/v6-phase1-scaffolding (2026-05-15 12:36, 12:40, 12:56). Deterministic gates pass; LLM grader job fails because ANTHROPIC_API_KEY is not set in GitHub Actions secrets for that workflow.
Impact: Phase 2a + Phase 2b PRs (which actually change prompt content) cannot pass the 9-axis CI grader — the grader can't run. SD will be tempted to admin-merge prompt changes.
Mitigation: Plumb ANTHROPIC_API_KEY into .github/workflows/v6-prompt-compliance.yml (single-line workflow secret add). One-shot fix; SD task.

G-3 — `tests/test_hotfix_837_absorption.py` does NOT exist (HIGH)¶

Evidence: ls tests/ | grep -iE "hotfix|absorb" returns nothing.
Spec reference: conversation-v6-feature.md §5 explicitly calls this fixture out as "NEW rev 5 per compliance review".
Impact: Without this fixture, Phase 2a port of the two #837 paragraphs into conversation_v6.yaml HARD BANS can drift in wording, weakening rule 2.2. The spec is explicit: "byte-identical".
Recommendation: Land this fixture as the FIRST work of Phase 2a (before any content port).

G-4 — `tests/test_v5_rule_verbatim_preservation.py` does NOT exist (HIGH)¶

Evidence: ls tests/ | grep -iE "verbatim|v5_rule" returns nothing.
Spec reference: conversation-v6-feature.md §4 + v6-rule-location-map.md §3 (40+ verbatim phrases enumerated).
Impact: Same drift risk as G-3 but for the broader v5 rule set (4-part doc-trust framing, demographic clarification, emotional word list, axis discipline list).
Recommendation: Land alongside test_hotfix_837_absorption.py as Phase 2a pre-work.

G-5 — `tests/test_lockstep_consistency.py` does NOT exist (MED)¶

Evidence: Not in tests/ directory.
Spec reference: conversation-v6-feature.md §8.5 + v6-rule-location-map.md §0.
Impact: Lockstep CI gate that reads v6-rule-location-map.md and asserts rules land at declared destinations is missing. Without it, Phase 2a/2b silently drops a rule = silent regression.
Recommendation: Land before Phase 2a starts (so the absorbing PR is the FIRST to be gated).

G-6 — `stages.yaml > extractors_active` is empty in scaffolding (MED)¶

Evidence: All 12 stages in config/prompts/stages.yaml have extractors_active: [] # TODO(phase-2b).
Spec reference: spec §3.4 + v6-stages-extractors-matrix.md §3 (30 run cells, 2 cond cells).
Impact: Phase 3 extractor work (compose_v6() reads extractors_active to know which extractors to spawn) cannot land before Phase 2b populates the lists.
Recommendation: This is the documented Phase 2b deliverable. No action — flagged here for visibility.

G-7 — `patient_context_builder` dataclass-producing services don't exist (MED)¶

Evidence: app/agents/patient_context_builder.py exists (155 lines, from PR #925) with the assembly interface, but it expects dataclasses CaseSummary, FhirObservationSummary, DocumentManifest, WorkflowSnapshot, PatientPreferences from owning-domain services. None of these dataclass-producing service functions exist yet on main.
Spec reference: conversation-v6-feature.md §2.4 revision rev 3 — "dataclass-producing services MUST use BaseRepository._scoped_query(tenant_id)".
Impact: compose_v6() cannot move past NotImplementedError without these services. This is a Phase 2a-2b dependency that's not currently broken out as its own work item.
Recommendation: Scope into a Phase 2a sub-task. Estimate: 1-2 days. Likely Sonnet (mechanical — wrap existing repository reads in dataclass-returning service functions).

G-8 — Naidu clinical sweep on #837 + #832 — task #169 closed but mid-stream sweep not formally re-scheduled (MED)¶

Evidence: gh issue view 169 is closed (Phase 0 multi-tenancy work). No open issue tracks "Dr. Naidu mid-stream review of merged recovery prompts + #837 wording" specifically. Spec §6 calls out 4 separate Naidu gates but the calendar lock status is not in the doc.
Impact: Spec §6 is explicit: "If Dr. Naidu is unavailable >2 weeks for ANY of the 4 gates, the phase pauses." All 4 windows MUST be locked before Phase 0 starts. Phase 0 already shipped — confirm whether the windows are locked for 2a / 2b / 7.
Recommendation: SD confirms Naidu calendar status in writing before Phase 2a kickoff.

G-9 — Mid-conversation rollback test (spec §8.2.1) — does it exist? (LOW)¶

Evidence: Spec §8.2.1 describes the expected behavior (mixed prompt_arch stamps within one conversation) but does not list a test file. No tests/test_mid_conversation_rollback_*.py in the tree.
Impact: Phase 6 dual-shadow ramp could trigger a mid-conversation arch flip and produce inconsistent traces. Without a fixture, the audit cannot prove the spec §8.2.1 behavior holds.
Recommendation: Add to Phase 5 / 6 work list. Estimate: 0.5 day.

G-10 — Identity-aware Flagsmith pass-through (#535) (RESOLVED)¶

Evidence: gh issue view 535 is CLOSED.
Status: Phase 1 prereq satisfied (per spec §3.5.1 row 8 — "#535 MUST be resolved before v6 Phase 1 starts"). No action.

G-11 — `addendum_priority_clinical_first.py` test (LOW)¶

Spec reference: §9 risk row + Appendix B testlist.
Status: Not in tree. Knowledge addendums in scaffolding (PR #925) lack priority: and category: fields. Phase 2b deliverable.

G-12 — 18 cross-spec inconsistencies tracked in `v6-trio-consistency-findings.md` (LOW-MED)¶

Evidence: docs/specs/v6-trio-consistency-findings.md enumerates 18 findings (F-01 through F-18), 5 MAJOR + 13 MINOR.
Major ones:
F-01: stage count mismatch in #859 OQ.02
F-02: intake referenced as stage (not in §1 list)
F-03: CI gate algorithm doesn't cross-read sibling specs
F-04: cross-spec links missing in #855 and #859
F-05: 17 raw open questions across 3 docs → Naidu burn risk
Recommendation: Squash MAJOR findings before Phase 2a starts. MINOR can defer.

G-13 — Dr. Naidu review gates not formally scheduled (HIGH)¶

Evidence: Spec §6 footnote (revised) lists 4 mandatory Naidu sign-offs (Phase −1 #837 mini, Phase 2a base rules, Phase 2b stages content, Phase 7 validation). No tracking issue or calendar artifact in the repo.
Recommendation: Create one tracking issue per Naidu gate; link from spec §6.

G-14 — `apps/patient-app/src/components/chat/rich_content_types.generated.json` manifest does NOT exist (LOW)¶

Spec reference: §3.9 — needed for the FE/BE drift CI gate.
Status: Scoped as "1 day work" in Phase 1, but not in any merged PR.
Recommendation: Land in Phase 4 (frontend phase) alongside the new RichCard.tsx entries.

G-15 — `stage_resolver.py` violates the companion-doc no-logging purity contract (LOW)¶

Evidence: app/services/stage_resolver.py:146,158,168 emit logger.warning / logger.debug calls. docs/specs/v6-stage-resolver-truth-table.md §1:23 states: "Pure function. No I/O, no LLM, no DB writes, no logging."
Impact: Practically harmless today — logger.warning is side-effecting but doesn't change return value. However it breaks property-test stability and the spec contract; a future implementer relying on the pure-function claim could be surprised.
Recommendation: Either tighten the spec to "no observable side effects on returned value" OR remove the loggers and surface malformed-state signals via the return value. Decide in Phase 2a kickoff.

G-16 — `WorkflowState` key-name remapping is silent (LOW)¶

Evidence: app/services/stage_resolver.py:72-79 silently maps spec field names (documents_uploaded, match_results_shown, provider_selected) → live model names (required_documents_uploaded, matching_complete, providers_selected). Test fixtures use the live names so the gap is invisible.
Impact: A future implementer following spec §2.6 literally will pass spec-named keys to WorkflowState({...}) and see all values silently default to False — every stage rule will fail to match → fallback to support on every turn.
Recommendation: Document the mapping in a top-of-class docstring on stage_resolver.py OR accept both names via a small adapter layer. Address before Phase 2a expands the truth-table surface.

G-17 — `compose_v6` returning `None` produces no `v6_fallback_reason` (MED)¶

Evidence: app/agents/v6_dispatcher.py:188-193 constructs DispatchResult(arch="v6", v6_artifact=artifact) without inspecting artifact. conversation_prompt.py:176 guards with if dispatch.arch == "v6" and dispatch.v6_artifact is not None — so None silently falls to v4 path with NO v6_fallback_reason trace tag. The Layer 4 monitor cannot alert on this incoherent state.
Impact: Phase 2a wiring may briefly produce malformed compose_v6 returns during incremental rollouts. Without a fallback reason tag, the silent v4 fallback is invisible in Langfuse.
Recommendation: In Phase 2a, validate compose_v6's return shape (dict with system: str, stage_id: str, cache_segments: list) and emit a new v6_fallback_reason="compose_returned_invalid" trace tag when the shape is wrong. Add compose_returned_invalid to ALERTABLE_FALLBACK_REASONS in the same PR.

G-18 — Stage-resolver rule fall-through edge case has no test (LOW)¶

Evidence: When intent_completion == 1.0 AND documents_uploaded == True AND medical_status.completion < 0.7, rules 3 and 4 both fail (rule 3 requires intent < 1.0, rule 4 requires not documents_uploaded). No subsequent rule matches → fallback to support. tests/test_stage_resolver.py does not exercise this combination.
Impact: The intent here may be deliberate (records have been uploaded but the medical_status extractor hasn't caught up yet, so support is the correct conservative answer) — but without a test it's not pinned. A future refactor could silently change the behavior.
Recommendation: Add a single test asserting this combination → "support". 10 minutes of work; do during Phase 2a kickoff.

G-19 — FE TransportOfferCard follow-up items (MED)¶

Identified by post-merge code + test reviews 2026-05-16. Bundle into a single follow-up PR (fix/transport-offer-card-wiring) when transport endpoints near rollout.

patientAction is a no-op in RichCard.tsx:260-263. The card calls patientAction(bookingIdDraft, 'select') and patientAction('', 'decline_all') but the handler in RichCard is a stub. Same Phase D deferred state as RecoveryOfferCard; not a regression. Wire to ConversationApp → MessageThread → chat send-message flow before transport endpoints go live.
declineAll API silently swallows ALL errors at apps/patient-app/src/services/transportApi.ts:88-90. Narrow to 404 only; re-throw others so the component's error banner fires correctly.
/design-preview/transport is publicly reachable (apps/patient-app/src/App.tsx:149). Wrap in ProtectedRoute or import.meta.env.DEV guard for consistency with other design-preview routes.
RichCard.tsx transport_offer branch has zero integration tests. Add a test rendering <RichCard> with contentType='transport_offer' + a minimal fixture; assert TransportOfferCard mounts.
transportApi.ts has no dedicated test file. Unit-test toTransportOption (snake→camel) + declineAll 404-no-op + 500-rethrow behavior.
declineAll rejection path test missing. Mirror the existing selectOption error test.
Backend: cross-module private import. app/agents/v6_dispatcher.py:82-85 imports _resolve_prompt_arch and _resolve_v6_tenant_allowlist (both _ prefixed) from prompt_loader.py. Promote to public symbols OR relocate to a shared v6_config.py before Phase 2a expands the dispatch surface.
Vendor name PostHog property fixed inline in the curaway-health-navigator follow-up PR (vendor_id → vendor_name) — no Phase 2a tracking needed.

7. Inheritance map (CRITICAL — v4.1 / v4 + #837 as starting point)¶

For each v6 absorption section, the exact source text that must be preserved verbatim. This is the input contract for Phase 2a.

v6 destination section	Source (file + line range)	Verbatim requirement
`conversation_v6.yaml` HARD BANS — rule 2.2a (treatment recommendation ban)	`config/prompts/base/conversation_v4.yaml:111`	YES, byte-identical. Asserted by `tests/test_hotfix_837_absorption.py` (must be created). Source phrases: `"NEVER recommend a specific procedure, surgery, or course of treatment"`, `"the right next step is"`, `"why [procedure] makes sense for your case"`, `"That's a clinical decision your doctor or specialist makes"`, `"Surfacing what a document contains is allowed; choosing the procedure for the patient is not."`
`conversation_v6.yaml` HARD BANS — rule 2.2b (scope rejection ban)	`config/prompts/base/conversation_v4.yaml:112`	YES, byte-identical. Source phrases: `"NEVER reject a patient based on procedure type, condition, or specialty"`, `"Curaway coordinates care across all specialties"`, `"we don't handle that"`, `"this is outside our scope"`, `"Curaway isn't set up for"`, `"Let me flag this with our care team so we can connect you with the right specialist."`
`conversation_v6.yaml` DOCUMENT-TRUST FRAMING — rule 2.1	`docs/specs/conversation-v5-feature.md:63-98` (rule definition; never landed in any base prompt file)	Partial verbatim. Verbatim NEVER phrases: `"different from what your doctor told you"`, `"this is not [diagnosis]"`, `"the diagnosis is wrong"`, `"I'm seeing findings that contradict"`. Verbatim ALWAYS phrases: `"I want to make sure these have been factored in"`, `"could you check with the oncologist whether"`, `"Surfacing factual findings IS allowed"`. The 4-part framing pattern's structure can be modernized; the phrase list cannot.
`conversation_v6.yaml` DEMOGRAPHIC GROUNDING — rule 2.3	`docs/specs/conversation-v5-feature.md:131-155`	YES for the identity clarification phrase: `"The report I'm reading lists the patient as X — is this for someone other than yourself?"`. Surrounding guidance can be modernized.
`conversation_v6.yaml` VOICE RULES — rule 2.5	`config/prompts/base/conversation_v4.yaml:40` (existing "NAME THE SPECIFIC HARD THING") + `docs/specs/conversation-v5-feature.md:214-237` (v5 Rule 2.7 strengthening)	YES for the 7-word list: `exhausted`, `scared`, `desperate`, `overwhelmed`, `frustrated`, `worried`, `tired`. Must appear as a literal list inside section anchored by `# V5-RULE-2.7-EMOTIONAL-VERBATIM`.
`conversation_v6.yaml` ONE QUESTION PER TURN — rule 2.6	`config/prompts/base/conversation_v4.yaml:38` (existing "ONE QUESTION ONLY") + `docs/specs/conversation-v5-feature.md:184-212` (v5 Rule 2.6 SAME-TURN AXIS)	YES for the 7-axis list: `Laterality`, `Mechanism`, `Severity`, `Timeline`, `Prior treatment`, `Demographics`, `Records availability`. WRONG/RIGHT example pair verbatim. Each stage in `stages.yaml` declares `do_not: [stack-questions]` (redundant placement appropriate per spec §4).
`conversation_v6.yaml` JSON RESPONSE FORMAT	`config/prompts/base/conversation_v4.yaml:186` envelope OR `config/prompts/base/conversation_v4.1.yaml:187` envelope (verified identical between v4 and v4.1 per spec §1.4)	YES, byte-identical. The `{"message": "...", "extracted_data": {...}, "detected_comorbidities": [...], "phase_complete": false, "suggested_next": null, "missing_critical_info": []}` envelope must appear unchanged. Asserted by `tests/test_v5_rule_verbatim_preservation.py` per spec §1.4.
`conversation_v6.yaml` REMEMBER	`config/prompts/base/conversation_v4.yaml:191-195` (4 numbered rules: ACKNOWLEDGE BEFORE ASKING, NEVER DIAGNOSE, HONOR YOUR PROMISES, NEVER PROJECT EMOTIONS)	Verbatim. These are the "4 most important rules" — the explicit final reminder block.
`conversation_v6.yaml` ROLE / COLLECT BEFORE MATCHING / VOICE / NAME / FORMAT / FACTS sections	`config/prompts/base/conversation_v4.yaml` (NOT v4.1 — they're identical for these sections, but v4 is canonical) per the line-level map in `docs/specs/v6-rule-location-map.md §2.1-2.18`	Mixed verbatim / semantic. The `verbatim:` column in `v6-rule-location-map.md` is the per-rule authority.
`stages.yaml > discovery.guidance` + `stages.yaml > procedure_identification.guidance` — rule 2.4 records re-offer	`docs/specs/conversation-v5-feature.md:156-175` + `config/prompts/phase_contexts/v2/records_first.yaml` + `identify_procedure.yaml`	Semantic only. `re_offer_on_turn: [2, 3]` field per spec §4. Cadence-enforced — fixture must show 5-turn `discovery` stagnation triggers re-offer on turns 2 AND 3.
`stages.yaml > <stage>.do_not`	`config/prompts/phase_contexts/v2/*.yaml` per-phase `DO NOT` lists + `recovery_offer.yaml` + `recovery_checkin.yaml` patronizing-filler ban list	Verbatim for ban lists. Patronizing-filler list: `I hear you`, `I understand`, `I'm here for you`, `completely natural to feel`. Source: `v6-rule-location-map.md §3.5`.
`knowledge/mso_second_opinion.yaml`	`config/prompts/base/conversation_v4.yaml:204` `mso_offer_addendum:` block	Verbatim. Same gating flag (`mso_patient_offer_enabled`) — Phase 2b regression test asserts flag honored across v4↔v6 toggle.

The line-level absorption map for every other rule (ROLE, VOICE, NAME, NEVER, CONT, EMO, PROJ, NONSENSE, ABROAD, FIRST, APPROACH, THINK, FORMAT, SAFETY, FACTS, EXAMPLES, JSON, REMEMBER + 6 phase_contexts/v2/*.yaml) lives in docs/specs/v6-rule-location-map.md §2.1-2.26. That doc is the per-rule authority. This §7 is the summary contract for Phase 2a kickoff.

8. Phase-by-phase next-steps¶

Concrete ordered list of what happens AFTER Phase 1 PRs merge. Each step: dependencies, who, rough estimate.

#	Step	Dependencies	Who	Estimate
1	Merge Phase 1 stack (#924 + #925 + #926 + #927 + #928 + #929 in dependency order)	LLM-grader CI auth (G-2) fixed OR explicit admin-merge approval	SD + Claude	1 day calendar (CI thrash)
2	Plumb `ANTHROPIC_API_KEY` into v6 CI workflow (close G-2)	None	SD (single secret add)	5 minutes
3	Create absorption fixtures — `tests/test_hotfix_837_absorption.py` + `tests/test_v5_rule_verbatim_preservation.py` + `tests/test_lockstep_consistency.py` (close G-3, G-4, G-5)	`v6-rule-location-map.md` published (DONE)	Sonnet author (tests are deterministic — no LLM)	1-2 days
4	Lock Dr. Naidu calendar windows for 2a, 2b, 7 (close G-13)	None	SD	calendar dependent
5	Squash G-12 MAJOR findings in `v6-trio-consistency-findings.md` (F-01 through F-05)	None	Opus or Sonnet — one PR per finding	1-2 days
6	Phase 2a kickoff — port v5 rules 2.1, 2.2 (#837 verbatim), 2.3, 2.5, 2.6 into `conversation_v6.yaml` AND backport #837 into `conversation_v4.1.yaml` in the same PR (close G-1 + Phase 2a content port in one Naidu touchpoint)	Steps 2-5 complete	Opus author + single Naidu reviewer pass	1-2 days + Naidu calendar
7	Patient context builder dataclass-producing services (close G-7)	None (parallel to step 6)	Sonnet — mechanical wrap of existing repos	1-2 days
8	Phase 2b kickoff — port `stages.yaml > <stage>.{guidance, cards_to_use, advance_when, do_not, extractors_active}` from `phase_contexts/v2/*.yaml` + `v6-stages-extractors-matrix.md`	Phase 2a merged + Naidu signoff	Opus author + Naidu reviewer	2-3 days + Naidu calendar
9	Phase 3 — admin UI extensions	Phase 2a + 2b merged	Sonnet	1 day
10	Phase 4 — frontend deep-link cards + placeholder pages	Phase 2b merged (stages declare `cards_to_use`)	Sonnet	1 day
11	Phase 5 — extractor prompt language sweep (5 extractors, semantic-equivalent)	Phase 2b merged	Opus for content judgment, Sonnet for tests	1-2 days
12	Phase 6 — dual-shadow ramp 10%	Phases 2a-5 merged + cost dashboards green	SD + observation calendar	1 week observation
13	Phase 7 — manual validation (3 baselines + 3 after × 3 personas, 9-axis scoring)	Phase 6 observation complete + Naidu calendar	SD + Naidu	1 day live + Naidu calendar
14	Phase 8 — ramp to 50% then 100%	Phase 7 sign-off	SD	3 days
15	Phase 9 — 2-week observation	Phase 8 ramp complete	SD + Naidu (sampled audits)	2 weeks calendar
16	Phase 10 — decommission v4 paths	Phase 9 complete + Langfuse confirms zero v4 traffic	Sonnet, full shadow-import audit	1-2 days

9. Open questions for SD¶

Q1 — Should Phase 2a start before or after the LLM grader CI auth issue is fixed?¶

Options: - (a) Fix CI auth FIRST. Phase 2a then ships with the LLM-graded gate live → maximum confidence, zero rework. - (b) Start Phase 2a NOW using deterministic gates only (axes 1, 2, 6, 7, 8 + flag YAML). LLM grader retrofit when auth lands.

Recommendation: (a). The CI fix is 5 minutes; deferring it leaves the spec-mandated 9-axis gate non-functional for the highest-risk PRs.

Q2 — Should `stages_version` be a separate flag from `prompt_arch`?¶

Context: Spec §3.5.1 row 7 calls for a stages_version flag for minor stages.yaml versioning (e.g., v1.0, v1.1), paired with prompt_arch=v6 via VALID_VERSION_PAIRS enforcement in apps/admin-app/src/pages/Triage.tsx.

Options: - (a) Add stages_version now (Phase 1 stack extension). Risk: scope creep on a stack that's already 6 PRs deep. - (b) Defer to Phase 2b when stages content actually evolves. Risk: first stages.yaml content port has no versioning surface — re-rolling requires a prompt_arch flip. - (c) Bake the version into stages.yaml (version: "v1.0" field, already present in PR #925 line 6) and skip the flag. Risk: no Flagsmith rollback granularity for stages content.

Recommendation: (b) — defer. The version: field in stages.yaml is enough until content actually moves.

Q3 — Should the G-1 drift (#837 missing from v4.1) be backported NOW or absorbed by Phase 2a? RESOLVED 2026-05-16¶

Decision (SD, 2026-05-16): Fold into Phase 2a kickoff (option b). The first Phase 2a PR will bundle the conversation_v4.1.yaml backport with the conversation_v6.yaml HARD BANS port and the test_hotfix_837_absorption.py fixture — single Naidu review touchpoint covers both files since the paragraphs are identical to what he already approved for v4. Accepts ~1 week of un-patched v4.1 traffic to consolidate Naidu's attention.

Q4 — Naidu calendar — are all 4 windows locked?¶

Spec §6 footnote: "All 4 windows MUST be locked on his calendar before Phase 0 starts." Phase 0 has shipped. Confirm whether 2a / 2b / 7 windows are locked, or whether SD intends to operate without them.

Recommendation: Lock them in writing this week or document the deviation.

Q5 — Should the Phase 0 LLM grader run on EVERY Phase 2a/2b PR or only on the merge-to-main commit?¶

Context: Spec §3.9 implies per-PR. Cost concern: each grader run is ~$0.30 + 30s. If Phase 2a iterates 5 times, that's $1.50 + 2-3 min CI per iteration.

Options: - (a) Every PR push (highest catch rate). - (b) Only on PR open + on each commit author-tagged @grader (manual trigger via PR comment). - (c) Only on merge-to-main (lowest cost, slowest feedback).

Recommendation: (a) — $1.50 / iteration is irrelevant; clinical-safety regressions are not.

Q6 — Should `support` stage be the default for new cases (per spec §10 Q1)?¶

Spec note: Currently spec'd as a fallback safety net. Could also be the entry stage. SD has not resolved.

Recommendation: Surface to Naidu in the Phase 2a base-prompt-rules review. He should decide; spec defers.

Q7 — Phase 6 cache-hit acceptance criteria — what if Seg 2 < 60% during ramp?¶

Spec note: §2.5 acceptance criterion blocks ramp if Seg 2 <60% or Seg 3 <50% sustained 24h.

Options on miss: - (a) Pause ramp, investigate cache invalidation patterns (likely culprit: too-aggressive invalidate_case_cache() calls). - (b) Ramp anyway with cost mitigation (smaller stage profiles). - (c) Raise the threshold (acknowledge cache hit rate is fundamentally constrained by Anthropic's invalidation behavior).

Recommendation: Document the SOP for (a) in docs/runbook/prompt-rollback.md (new — per spec Appendix B Docs section). Don't pre-decide between (a/b/c) — depends on what the dashboard shows.

10. Appendix — file / issue / memory index¶

Specs (read in this order for new readers)¶

Doc	Status	Purpose
`docs/specs/conversation-v6-feature.md`	rev 6, final for Phase 0 kickoff	Canonical v6 spec, 962 lines
`docs/specs/conversation-v5-feature.md`	legacy / superseded	Original v5 rule definitions (rules 2.1-2.7) — still the canonical wording source for absorption
`docs/specs/v6-rubric-locked.md`	locked rev 2, 257 lines	9-axis grader rubric (Phase 0 + Phase 7 consumer)
`docs/specs/v6-stage-resolver-truth-table.md`	DRAFT 2026-05-12, 354 lines	Companion to v6 spec §2.6 (NOT YET BLOCKING — Phase 1 is shipping)
`docs/specs/v6-stages-extractors-matrix.md`	DRAFT 2026-05-12, 288 lines	Companion to v6 spec §3.4 — blocks Phase 2b
`docs/specs/v6-rule-location-map.md`	DRAFT, 737 lines	Lockstep registry for §8.5 CI gate — blocks Phase 2a
`docs/specs/v6-trio-consistency-findings.md`	268 lines	18-finding cross-doc audit (5 MAJOR + 13 MINOR)

Tracking issues¶

Issue	State	Purpose
#836	OPEN	v6 epic — tracks the full Phase 0-10 sequence
#837	MERGED PR	Production hot-fix — two new SAFETY bullets in v4.yaml (treatment recommendation + scope rejection bans)
#832	MERGED PR	Recovery prompts + extractor + orchestrator wiring (ADR-0018 §K) — downstream dependency for Phase 5
#491	OPEN	Multi-question discipline → v5 rule 2.6
#547	CLOSED	Demographic fabrication → v5 rule 2.3
#550	OPEN	Laterality re-ask → v5 rule 2.6
#560	OPEN	Document trust framing → v5 rule 2.1
#642	CLOSED	Treatment recommendation → v5 rule 2.2 + #837 hot-fix
#743	CLOSED	Scope rejection → v5 rule 2.2 + #837 hot-fix
#535	CLOSED	Flagsmith identity bug — Phase 1 prereq (RESOLVED)
#359	CLOSED	Prompt versioning + audit trail

PRs¶

PR	State	Title
#921	MERGED	Phase 0 Steps 1+2 — locked rubric + LLM-grader scorer
#922	MERGED	Phase 0 Step 3 — fixture corpus
#923	MERGED	Phase 0 Steps 4+5+6 — grader cache + 6 gates + CI
#924	OPEN	Phase 1 Step 1 — `prompt_arch` + tenant allowlist flags
#925	OPEN	Phase 1 scaffolding — stages.yaml + knowledge + stubs
#926	OPEN	Phase 1 Layer 2 — YAML artifact validator
#927	OPEN	Phase 1 Layer 3 — dispatcher + Langfuse tags
#928	OPEN	Phase 1 Layer 4 — fallback observability + cost scaffold
#929	OPEN	Phase 1 Layer 5 — end-to-end smoke tests

Memory files relevant to this plan¶

File	Purpose
`feedback_agent_chat_sacrosanct.md`	Discipline for every prompt change — 3 baselines + 3 after on 3 personas
`reference_v4_parser_strict_false.md`	`json.loads(strict=False)` requirement — preserved in spec §1.4
`feedback_flagsmith_dual_env.md`	Every flag flip applies to BOTH Production and Development envs
`reference_flagsmith_v2_env_patch.md`	V2 versioning + env-scoped PATCH endpoint
`feedback_check_railway_after_migration_merge.md`	Migration Roundtrip CI is `continue-on-error: true` — confirm prod deploy after merge
`project_execution_order_transport_v6.md`	Transport admin → 3-reviewer subagents → v6 implementation (per SD 2026-05-15)
`project_work_queue.md`	Cross-session items (Clerk webhook, etc.)

Code paths most relevant to v6¶

Path	Role
`config/prompts/base/conversation_v4.yaml`	Production base (with #837 hot-fix at lines 111-112)
`config/prompts/base/conversation_v4.1.yaml`	Production base with MSO addendum baked in (MISSING #837 paragraphs — see G-1)
`config/prompts/base/conversation_v6.yaml`	v6 scaffold (TODO markers)
`config/prompts/stages.yaml`	v6 stages scaffold
`config/prompts/knowledge/*.yaml`	v6 knowledge addendums (4 files scaffolded)
`config/prompts/layer_contexts/intent_capture.yaml`	v1 layer context
`config/prompts/layer_contexts/intent_capture_v1.1.yaml`	v1.1 layer context (paired with v4.1)
`config/prompts/phase_contexts/v2/*.yaml`	Production v2 phase contexts (intake, records_first, identify_procedure, document_review, general, recovery_offer, recovery_checkin)
`config/feature_flags.yaml`	Flag defaults
`app/agents/conversation_prompt.py`	`get_system_prompt` — has v6 dispatcher branch (PR #927)
`app/services/prompt_loader.py`	`_resolve_prompt_version`, `resolve_versions`, `_resolve_prompt_arch`, `_resolve_v6_tenant_allowlist`
`app/agents/v6_dispatcher.py`	v6 arch dispatch decision (PR #927)
`app/services/v6_artifact_validator.py`	Boot-time YAML validator (PR #926)
`app/services/v6_fallback_monitor.py`	Telegram alert + cost scaffold (PR #928)
`app/services/stage_resolver.py`	§2.6 truth-table resolver (PR #925)
`app/agents/patient_context_builder.py`	§2.4 context block builder (PR #925)
`app/services/prompt_loader_v6.py`	`compose_v6` stub (PR #925)

End of unified plan.