ADR-0028: Prompts stay YAML source-of-truth; graph is build-time projection only¶

Date: 2026-05-19
Status: Accepted
Authors: SD, Claude
Related: ADR-0026 (matching framework — Postgres source, Neo4j projection)

Context¶

Curaway's conversation_v6 architecture composes runtime prompts from many sources: base prompt, voice rules, 12 mutually-exclusive stages, 5+ knowledge addendums, soon ~9 SOPs (Standard Operating Procedures, per Dr. Naidu's 2026-05-19 ortho demo), and 50+ FAQs per procedure. Each source has dependency relationships — TKR SOP extends ortho-family SOP extends base intake rules; addendums fire when stage_active in {records_collection}; SOP layers require must_collect fields from medical_history.

A question surfaced during the SOP framework brainstorm: should these prompt artifacts be managed in Neo4j (the same graph DB that holds clinical knowledge) so the dependency structure is queryable at runtime?

Options considered¶

Option A — Runtime graph (rejected)¶

Every patient turn: composer runs a Cypher query against Neo4j to assemble the prompt for this case. Nodes = prompt segments (base, voice, stage, SOP, addendum, FAQ, forbidden phrases). Edges = EXTENDS, OVERRIDES, TRIGGERS_WHEN(case_predicate), REQUIRES, CONFLICTS_WITH.

Pros: - Dependency structure is explicit and queryable - Single composition mechanism instead of three loaders (stage_resolver, knowledge_addendum_selector, sop_loader) - Author UX via graph DB browser

Cons (recurring operational costs): - Latency: +10-50ms per turn for Cypher traversal vs ~0.01ms for cached YAML reads (per turn, after process warm-up; one-time ~50-100ms cold-start tax on first request after a Railway redeploy when functools.lru_cache is empty). At ~4K active cases/month × ~20 turns/case = ~80K extra graph queries/month, scaling linearly with patient volume. - Anthropic prompt cache invalidation: Cache key = exact prefix bytes. Dynamic graph composition risks cache misses where YAML composition hits ~90% reliably. Cost math at 10K MAU: ~4K active cases/month × ~20 turns/case × ~5K input tokens × $3/MTok = ~$1.2K/month total LLM input cost; losing ~30-50% of the cache discount adds ~$360-600/month. Recurring, scales with volume. - Availability: Neo4j Aura Free auto-pauses every 3 days; the keep-alive cron exists because of this. For matching, ADR-0026 establishes PG as source of truth so Neo4j can be rebuilt from PG if it goes down (a nightly bulk-rebuild cron exists). For chat, prompt assembly would have no equivalent rebuild path — a Neo4j hiccup = every patient turn breaks immediately, with no graceful degradation. - Test cost: Every prompt unit test would need a populated Neo4j fixture or accurate mock. CI pipeline gains a Neo4j service container; local dev needs Docker+Neo4j running. ~10-20% slower CI plus drift risk. - Incident response: Today, "agent said the wrong thing" → grep YAML (~30s). With runtime graph: trace Cypher traversal in Langfuse (~5min per incident). - Authoring UX: Naidu writes in Google Docs. We translate to YAML (review-able PRs) or graph (build a UI OR sync YAML → graph). Realistically option (c) — sync — which adds a layer without removing YAML.

Option B — Build-time graph projection from YAML (accepted)¶

YAML files in config/prompts/{base,stages,sops,knowledge}/ remain source of truth. A CI build step scans them, emits typed edges, and produces: - docs/prompts/prompt-graph.html — interactive viz for Naidu and the team (regenerated on every prompt change) - docs/prompts/prompt-graph.json — machine-readable for the linter - Linter detects: orphan addendums, conflicting overrides (e.g., SOP says "3 questions" but base says "one per turn"), missing parent files, dead must_collect fields.

Runtime composer reads YAML files with functools.lru_cache. Same pattern as today's knowledge_addendum_selector.py.

Pros: - Zero recurring operational cost over today's baseline - Preserves Anthropic prompt cache hit rate (~90% discount stays intact) - Filesystem reliability for the chat surface - Naidu still gets visualization; team gets conflict detection in CI - Git diffs remain the review surface (every reviewer can read YAML PRs) - The graphify skill (already in toolbelt) can serve as the projection engine

Cons: - Build-time engineering effort (~2-3 days) to write the projection script + linter - Some elegance loss vs runtime graph — dynamic resolution (which nodes fire for THIS case) lives in code, not in the graph

Option C — Status quo (no projection)¶

Keep YAML, accept that conflict detection happens at PR review time only. Cheapest, but loses the strongest argument for graph thinking (mechanical conflict detection + Naidu visualization).

Decision¶

Option B. YAML files are the source of truth. A build-time projection step emits the graph artifact for visualization and CI linting. Runtime composition stays in code, reading cached YAML.

This mirrors ADR-0026's split: Postgres is source of truth for matching data, Neo4j is a projection. Here: YAML is source of truth for prompts, a graph artifact is a projection. Same pattern, same operational benefits, same migration path if the math flips later.

When this decision would flip¶

We would revisit (and likely move toward Option A) if any of these conditions hold: - Prompt artifact count reaches ~1000+ nodes (today: ~30; with SOPs at full coverage: ~100) - Cross-tenant prompt customization at fine granularity becomes a product requirement - Runtime A/B testing of prompt fragments at sub-stage granularity is desired - Authoring tooling becomes a build-it-yourself project (would need a graph backend anyway) - A clinical-audit or compliance regime requires graph-traceable provenance for every prompt fragment that influenced a patient-facing response (e.g., HIPAA-adjacent attestations, ISO 13485 software-as-medical-device, EU MDR). The build-time projection plus message-level prompt-version tagging partially addresses this; a regulator demanding edge-typed provenance would tip the math.

None of these hold today or in the alpha-release window.

Consequences¶

Phase 1 of the SOP framework ships with the YAML-based loader pattern (app/agents/sop_loader.py), matching how knowledge_addendum_selector.py already works.
A separate CI workflow / script will generate the graph projection; Phase 1.5 of the SOP framework adds this (see SOP design spec).
The graph projection becomes the canonical "where do I see how prompts compose" reference for Naidu, replacing ad-hoc YAML walks during clinical review.
If runtime graph becomes attractive later, the YAML schema (with explicit parent:, triggers_when:, must_collect: fields) is already graph-shaped — migration would be a sync layer, not a rewrite.
Composition-logic enhancements are fully compatible. Initiatives like #934 (v6.1 intent-aware stage resolver + multi-addendum injection) extend WHICH segments fire per turn; they live entirely in code (stage_resolver.py, knowledge_addendum_selector.py, a future intent_classifier.py) and do not require runtime graph composition. Storage stays YAML; selection stays code. This ADR is orthogonal to and compatible with such follow-ups.