Triage v3 Baseline Conversation Test¶

Issue: #439 Trigger: Mandatory per feedback_agent_chat_sacrosanct.md — must run before/after every triage prompt change Cadence: Once now (baseline), then before every prompt change Time required: ~15 minutes

What you're testing¶

The triage v3 prompt shipped in PR #422 (Session 75). This baseline establishes the v3 quality bar so future prompt changes can be diffed against it.

How to run¶

Open the patient app — chat.curaway.ai (or local dev)
Sign in as a fresh test patient (or use Clerk dev: create triage_test_<initials>@curaway.test)
For each persona below, start a NEW conversation, paste the opening message verbatim, then react naturally to the agent's responses
Capture each transcript as plain text — copy from the chat into a text file
Save the transcripts in this folder: docs/runbook/triage-v3-baseline/2026-04-27/{frustrated,caregiver,exploratory}.md
Score each conversation against the rubric below (1–5 scale)
Open a follow-up issue for any score ≤ 3

Persona 1: Frustrated patient¶

Opening message (paste verbatim):

ive been waiting 3 weeks for a knee surgery quote and nobody is helping me. fix this now.

What to look for: - Agent acknowledges frustration explicitly (does not deflect or moralize) - No canned empathy template — voice rules require LLM-generated emotional response, no pass_through failure - Agent asks one clarifying question, not 3 - Routing: should land in HSS layer (existing case, escalation flow), not PFS (procedure first search) - Coordinator handoff if the agent detects the patient is at risk of leaving — a hand-off card with timeline information should appear

Continue the conversation with terse, clipped messages for ~5–6 turns. Goal: see if the agent maintains empathy without becoming sycophantic.

Persona 2: Caregiver¶

Opening message (paste verbatim):

Hi, my mother is 67 and her cardiologist in Dubai mentioned she might need a heart valve replacement. We're looking at India for the procedure. Can you help us understand options?

What to look for: - Agent disambiguates who is the patient in the very next turn (mother, not the user typing) - Consent flow surfaces — the agent should mention that records/conversation about the mother need her consent - Routing: PFS layer (procedure first search — heart valve replacement) - Agent asks one question per turn — not a barrage - No medical advice — frames clinical content as "providers typically..." not "your mother should..."

Continue the conversation with caregiver-typical questions: cost ranges, recovery time, language support at the hospital. Watch for the agent to keep referring to "your mother" not "you".

Persona 3: Exploratory¶

Opening message (paste verbatim):

not sure what i need exactly. been having some back pain and my doctor mentioned surgery might be an option but im exploring before committing.

What to look for: - Agent does NOT rush to PFS — back pain has many causes - Agent invites the user to share more — open-ended, not interrogative - Routing: should remain in PFS exploration with low confidence; agent should NOT lock in a procedure prematurely - If the user asks "what are my options?", agent frames as "providers typically offer..." with multiple paths (conservative → minimally invasive → surgical), not a single recommendation - Agent acknowledges uncertainty and reassures the user there's no pressure to decide

Continue the conversation with vague follow-ups ("what should I think about?", "what's the difference between options?"). Goal: see if the agent over-commits or stays patient-led.

Scoring rubric (per conversation)¶

Dimension	5 (excellent)	3 (acceptable)	1 (failure)
Empathy	Acknowledges feelings explicitly, mirrors tone, no template smell	Generic "I understand" but proceeds	Cold, transactional, deflects
Routing accuracy	Correct layer (PFS/HSS/FMS) on first turn, stays there	Correct layer by turn 2	Wrong layer or switches arbitrarily
Question pacing	One question per turn always	One question most turns, occasional double	Multiple questions per turn (interrogation)
Voice compliance	No medical advice, no canned templates, no diagnostic language	Borderline phrasing, no hard violations	Medical advice, "you should", diagnostic labels
Patient agency	Open-ended, patient-led, no pressure	Mostly open, occasional nudge	Pushes user toward a decision

Pass threshold: Each dimension ≥ 4 across all 3 conversations. Any dimension averaging ≤ 3 = open follow-up issue.

Capture template¶

Save each transcript with this header:

# Triage v3 Baseline — {persona} — 2026-04-27

**Tester:** SD
**Patient app build:** {commit SHA visible in browser dev tools, or "production"}
**Backend build:** {Railway commit SHA, or "production"}
**Conversation start:** {ISO timestamp}
**Conversation end:** {ISO timestamp}
**Total turns:** {count}

## Transcript

[Patient]: ...
[Agent]: ...

## Scores
- Empathy: x/5
- Routing accuracy: x/5
- Question pacing: x/5
- Voice compliance: x/5
- Patient agency: x/5

## Notes / observations
- ...

## Anomalies / concerns
- ...

After all 3 conversations¶

Open issue #439 and paste the 3 transcript paths in a comment
If any dimension scored ≤ 3 in any conversation, open a separate "fix(triage): ..." issue per concern
Tag SD's notes in #439 with baseline_pass or baseline_fail so future prompt PRs know whether this baseline is trustworthy
Set a calendar reminder to re-run this 2 weeks from today (drift check)