Configuration Reference¶
This page documents all configuration files and environment variables used by the Curaway backend. Configuration is split between YAML files (checked into the repo) and environment variables (managed per deployment).
config/procedures.yaml¶
Single source of truth for the Procedure catalog and all clinical payload. (ADR-0027, PR #960)
Loaded by app/seeds/seed_procedures.py::_load_procedures_yaml(), validated via Pydantic
(ProcedureYamlModel), and written to procedure_requirements via the upsert seeder.
The base templates (ORTHO_BASE, CARDIAC_BASE, ONCO_BASE, OPHTHALMOLOGY_BASE) remain
hand-coded in seed_procedures.py because they are inheritance parents, not catalog procedures.
Fields per entry¶
| Field | Type | Required | Description |
|---|---|---|---|
code |
str | yes | CPT code or custom code (BMT-001, ONCO-CHEMO). Pattern: [A-Z0-9_-]{2,32} |
name |
str | yes | Procedure display name |
specialty |
str | yes | orthopedics | cardiology | oncology | ophthalmology | dental | bariatric | cosmetic | fertility |
neo4j_slug |
str | no | Neo4j node key (auto-derived from name if absent) |
category |
str | no | Postgres category column (inferred from specialty if absent) |
parent_procedure_code |
str | no | Inheritance parent (e.g. ORTHO_BASE). See inheritance algorithm below |
keywords |
list[str] | no | Intake keyword mapper phrases |
icd_links |
list[str] | no | ICD-10 codes (stored as icd10_primary in DB) |
snomed_primary |
list[str] | no | SNOMED CT codes |
required_documents |
list[DocumentItem] | no | Patient-facing document checklist. See DocumentItem shape below |
required_tests |
list[dict] | no | Hospital-side pre-op tests |
comorbidity_screening |
list[dict] | no | Intake screening questions |
contraindications |
list[dict] | no | Absolute / relative contraindications |
cost_range |
CostRange | no | {min_usd, max_usd, us_average_usd} |
recovery_timeline |
dict | no | {hospital_days, local_recovery_days, full_recovery_weeks, ...} |
travel_considerations |
dict | no | {earliest_flight_post_op_days, companion_recommended, ...} |
data_source |
str | no | Provenance marker. Must match ^(fabricated\|naidu_approved)_<tag>_YYYY_MM_DD$ |
DocumentItem shape¶
required_documents:
- type: imaging # blood_work | cardiac | imaging | anesthesia | clinical |
# ophthalmology | dental | psych | pulmonary | neurological |
# fertility | pathology | records | lab_work | clearance | obstetric
name: "MRI Knee" # Artefact descriptor — NOT a directive or diagnosis
mandatory: true
max_age_days: 180
pathology is a new type added in PR-1 for oncology documents (tumor reports, biopsy results,
histology). It does not fit under imaging (no scan) or blood_work (not a lab value).
Inheritance algorithm (ADR-0027)¶
parent_procedure_code triggers shallow merge with list-concatenation:
- List fields (required_documents, required_tests, comorbidity_screening, contraindications):
parent items are prepended; child items appended.
- Child [] (empty) explicitly overrides (parent items dropped).
- Child key absent → full parent list inherited.
- Scalar fields: child value wins; parent fills when child key is absent.
Seeder + backfill¶
# Seed all procedures (upsert)
python -m app.seeds.seed_procedures
# Backfill fabricated rows in prod
python -m scripts.backfill_procedure_clinical_payload --dry-run
python -m scripts.backfill_procedure_clinical_payload
config/diagnostic_tests/seed.yaml¶
Single source of truth for the DiagnosticTest catalog. Loaded by app/seeds/diagnostic_test_loader.py, validated via Pydantic (DiagnosticTestYamlModel), and projected to Neo4j by app/seeds/diagnostic_test_graph.py.
Fields per entry¶
| Field | Type | Required | Description |
|---|---|---|---|
code |
str | yes | Unique slug used as Neo4j node key |
name |
str | yes | Human-readable test name |
category |
str | yes | blood_work | cardiac | imaging | pulmonary | clinical_clearance | infection_screen | pathology |
loinc |
str | no | LOINC code |
validity_days_default |
int | no | Fallback validity if procedure override is absent |
typical_cost_usd_cents |
int | no | Indicative cost in USD cents |
turnaround_hours |
int | no | Typical result turnaround |
data_source |
str | no | Provenance marker — present only for fabricated entries. Must match ^(fabricated\|naidu_approved)_<tag>_YYYY_MM_DD$ |
Ported from app/seed_graph.py::DIAGNOSTIC_TESTS (16 entries) + app/seed_procedure_tests.py::ADDITIONAL_DIAGNOSTIC_TESTS (16 entries, deduped to 15 net new) + 3 authored entries (echocardiogram, liver_function, kidney_function) tagged fabricated_pending_ops_2026_05_19. See PR-4 of the seeder unification roadmap (issue #960).
config/procedure_tests/seed.yaml¶
Single source of truth for Procedure → DiagnosticTest requirements. Loaded by app/seeds/procedure_test_loader.py, validated via Pydantic (ProcedureTestYamlModel), and projected to Neo4j as REQUIRES_TEST edges by app/seeds/procedure_test_graph.py.
Top-level structure¶
Fields per test entry¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
test_code |
str | yes | — | Must match a code in config/diagnostic_tests/seed.yaml |
validity_days |
int | yes | — | Result age limit in days |
is_mandatory |
bool | no | true |
Required vs. conditional/recommended |
priority |
str | no | derived from is_mandatory |
mandatory | conditional | recommended |
timing_note |
str | no | pre_travel |
pre_travel | on_arrival |
source_acceptance |
str | no | any_accredited |
any_facility | any_accredited | jci_nabh_accredited | hospital_only | any_licensed_physician | provider_lab_only |
on_site_required |
bool | no | false |
true if test must be performed at the treating hospital |
condition_note |
str | no | null |
Clinical context for conditional/recommended tests |
notes |
str | no | null |
Additional notes (stored as source_note on the REQUIRES_TEST relationship) |
Ported from app/seed_graph.py::PROCEDURE_TESTS (2 procedures: knee_replacement, bypass_surgery) + app/seed_procedure_tests.py::PROCEDURE_TESTS (the remaining 15 procedures). See PR-4 of the seeder unification roadmap (issue #960).
config/providers/seed.yaml¶
Single source of truth for the Provider catalog. (ADR-0027, PR-5 / issue #960)
Loaded by app/seeds/provider_loader.py::load_providers(), validated via Pydantic
(ProviderYamlModel), and written to the providers table via app/seeds/provider_seeder.py.
Projected to Neo4j by app/seeds/provider_graph.py.
Ported from 4 legacy Python modules:
- app/seed_providers.py — 42 base provider records
- app/seed_storefront.py — tagline, cultural_support, travel_info, facility data (6 demo providers)
- app/seed_storefront_enrichment.py — hero images, facility enrichment, technology lists, cultural/quality data
- app/seed_full_enrichment.py — full enrichment for all 42 providers
All 4 source files are now deprecation shims that raise SystemExit.
Fields per entry¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
slug |
str | yes | — | Unique provider slug, used as Postgres PK and Neo4j graph ID |
name |
str | yes | — | Provider display name |
provider_type |
str | no | hospital |
hospital | clinic | surgical_provider | recovery_provider | transport_vendor |
tenant_id |
str | no | null |
Null for catalog entries; set at runtime by Clerk org activation flow |
country_code |
str | yes | — | ISO 3166-1 alpha-3 (3 chars, all-uppercase: IND, TUR, THA, …) |
city |
str | yes | — | City name |
description |
str | no | null |
Institution description — must describe the provider, never give patient guidance (CLAUDE.md §9) |
logo_url |
str | no | null |
Logo image URL |
website_url |
str | no | null |
Provider website |
hero_image_url |
str | no | null |
Hero/banner image URL (storefront) |
tagline |
str | no | null |
Short tagline (storefront) |
address |
str | no | null |
Street address |
latitude |
float | no | null |
Latitude in range [-90, 90] |
longitude |
float | no | null |
Longitude in range [-180, 180] |
timezone |
str | no | UTC |
IANA timezone (e.g. Asia/Kolkata) |
specialties |
list[str] | yes | [] |
Medical specialty slugs |
accreditations |
list[str] | no | null |
JCI, NABH, ISO_9001, KOIHA, etc. |
languages_supported |
list[str] | yes | [] |
ISO 639-1 language codes |
bed_count |
int | no | null |
Total bed count |
annual_international_patients |
int | no | null |
Annual international patient volume |
operating_theaters |
int | no | null |
Number of operating theaters (storefront) |
icu_beds |
int | no | null |
ICU bed count (storefront) |
countries_served |
int | no | null |
Number of countries served (storefront) |
cost_index |
float | no | 1.0 |
Provider-level cost multiplier (must be > 0). 1.0 = average, 0.35 = 65% below average |
procedure_costs |
dict | no | null |
JSONB: {<slug>: {min_usd_cents: int, max_usd_cents: int}} |
outcome_score |
float | no | null |
Composite outcome score [0.0, 1.0] |
patient_satisfaction |
float | no | null |
Patient satisfaction rating [0.0, 5.0] |
total_reviews |
int | no | 0 |
Number of verified reviews |
cultural_accommodations |
list[str] | no | null |
e.g. halal_food, prayer_room, female_staff_available, interpreter |
dietary_options |
list[str] | no | null |
e.g. halal, kosher, vegetarian, vegan |
cultural_support |
dict | no | null |
Rich cultural support JSONB (storefront enrichment) |
travel_info |
dict | no | null |
Airport, transfer, hotel, visa information (storefront) |
metadata |
dict | no | {} |
FlexibleJSON envelope for additional provider metadata |
data_source |
str | no | null |
Provenance marker. Must match ^(fabricated\|naidu_approved)_<tag>_YYYY_MM_DD$ |
is_active |
bool | no | true |
Whether the provider is active in the catalog |
Seeder commands¶
# Seed all providers (upsert, idempotent)
python -m app.seeds._runner --only providers
# Or run the seeder directly
python -m app.seeds.provider_seeder
# Project to Neo4j
python -m app.seeds.provider_graph
# Dry-run (log what would change, no writes)
python -m app.seeds._runner --only providers --dry-run
Voice rules guardrail (CLAUDE.md §9)¶
Provider description fields must describe the institution, not give patient guidance.
- ✅
"Multi-specialty flagship hospital with JCI accreditation"— descriptive - ❌
"Best hospital for your knee replacement"— patient directive
config/guardrails.yaml¶
The guardrails configuration controls the AI agent's input classification and output validation. It prevents the agent from soliciting or disclosing sensitive information and keeps conversations within the medical tourism intake scope.
Classifier Categories¶
The input classifier assigns each user message to one of these categories before the message reaches the LLM:
| Category | Action | Description |
|---|---|---|
safe |
Allow | Normal intake-related messages (symptoms, travel questions, logistics) |
medical_advice |
Block | User is asking for a diagnosis or treatment recommendation |
pii_solicitation |
Block | User is providing unsolicited PII (SSN, credit card, etc.) |
emergency |
Escalate | User describes an emergency medical situation |
off_topic |
Redirect | Message is unrelated to medical tourism (e.g., sports, weather) |
profanity |
Block | Message contains profanity or abusive language |
legal_advice |
Block | User is asking for legal counsel (malpractice, liability) |
competitor_mention |
Log | User mentions a competing platform (logged but allowed) |
Forbidden Output Patterns¶
The output validator scans every LLM response against these 12 regex patterns. If any pattern matches, the response is blocked and a safe fallback is returned instead.
| Pattern ID | Description |
|---|---|
diagnosis_statement |
Statements like "you have [condition]" or "you are diagnosed with" |
dosage_recommendation |
Any mention of specific drug dosages |
treatment_prescription |
Direct treatment prescriptions ("you should take", "I recommend the drug") |
prognosis_prediction |
Survival rates, recovery time guarantees |
lab_interpretation |
Interpreting specific lab values as normal/abnormal |
emergency_self_treat |
Instructions for self-treating an emergency |
insurance_guarantee |
Guaranteeing insurance coverage or reimbursement |
price_guarantee |
Guaranteeing specific treatment costs |
legal_opinion |
Any legal advice or liability statements |
credential_leak |
API keys, passwords, or internal URLs in the response |
internal_instruction |
System prompt or internal instruction leakage |
competitor_endorsement |
Endorsing or recommending a competing platform |
Response Templates¶
When a message is blocked, the system returns a pre-configured response template:
response_templates:
medical_advice: >
I understand your concern, but I'm not able to provide medical diagnoses
or treatment recommendations. I can help connect you with qualified
medical professionals who can properly evaluate your condition.
emergency: >
If you are experiencing a medical emergency, please call your local
emergency services immediately. I can help you find appropriate care
after your immediate needs are addressed.
pii_solicitation: >
For your safety, please do not share sensitive personal information
like social security numbers or credit card details in this chat.
Any required information will be collected securely during the
formal intake process.
off_topic: >
I'm here to help with your medical travel needs. Could you tell me
more about the medical procedure you're interested in or the
destination you're considering?
Medical Keywords (Multilingual)¶
The classifier uses medical keyword lists in four languages to improve classification accuracy:
| Language | Example Keywords | Count |
|---|---|---|
| English | surgery, diagnosis, prescription, symptoms, treatment | ~120 |
| Arabic | جراحة, تشخيص, وصفة طبية, أعراض, علاج | ~90 |
| German | Operation, Diagnose, Rezept, Symptome, Behandlung | ~95 |
| Turkish | ameliyat, teşhis, reçete, belirtiler, tedavi | ~85 |
File Validation Limits¶
When guardrail file validation is enabled, uploaded files are checked against these limits:
| Parameter | Value | Description |
|---|---|---|
max_file_size_mb |
20 | Maximum file size in megabytes |
allowed_extensions |
.pdf, .jpg, .jpeg, .png, .dcm |
Permitted file types |
max_filename_length |
255 | Maximum filename character count |
scan_for_malware |
true | Whether to scan uploads with ClamAV (if available) |
ocr_timeout_seconds |
30 | Maximum time for PyMuPDF OCR extraction |
Workflow Gating Thresholds¶
Controls when the agent advances between workflow phases. Tune per corridor or procedure type.
Loaded by: app/services/gating_config.py → imported as GATING dict
EHR Gating¶
| Parameter | Default | Description |
|---|---|---|
gating.ehr.completeness_for_matching |
0.5 |
Minimum EHR completeness (0.0–1.0) to offer provider matching |
gating.ehr.completeness_with_meds |
0.4 |
Lower threshold when medications have been confirmed (high-signal data) |
gating.ehr.matching_ready_threshold |
0.4 |
Minimum for patient_state to report matching_ready: true |
Matching Gating¶
| Parameter | Default | Description |
|---|---|---|
gating.matching.require_medications |
true |
Whether medications must be asked/confirmed before matching |
Document Matching¶
| Parameter | Default | Description |
|---|---|---|
gating.documents.parameter_match_coverage |
0.5 |
Minimum fraction of expected parameters to count as a match (e.g., 4/7 CBC params) |
Intake Gating¶
| Parameter | Default | Description |
|---|---|---|
gating.intake.min_answers_for_completion |
3 |
Minimum substantive patient answers before intake can advance |
Tuning for demo vs. production
For investor demos, lower completeness_for_matching to 0.3 for faster flow.
For production, keep at 0.5 or higher for better match quality.
config/model_registry.yaml¶
The model registry defines which LLM models are available, their capabilities, and how they are assigned to different tasks.
Model Definitions¶
models:
gpt-4o:
provider: openai
context_window: 128000
max_output_tokens: 4096
supports_vision: true
supports_function_calling: true
cost_per_1k_input: 0.005
cost_per_1k_output: 0.015
gpt-4o-mini:
provider: openai
context_window: 128000
max_output_tokens: 4096
supports_vision: true
supports_function_calling: true
cost_per_1k_input: 0.00015
cost_per_1k_output: 0.0006
claude-3-5-sonnet:
provider: anthropic
context_window: 200000
max_output_tokens: 8192
supports_vision: true
supports_function_calling: true
cost_per_1k_input: 0.003
cost_per_1k_output: 0.015
Tier Assignments¶
Models are assigned to tiers that map to specific use cases:
| Tier | Model | Use Case |
|---|---|---|
intake_chat |
gpt-4o-mini |
Patient intake conversations (cost-optimized) |
medical_summary |
gpt-4o |
EHR summary generation (accuracy-critical) |
document_analysis |
gpt-4o |
OCR result analysis and extraction |
provider_matching |
gpt-4o-mini |
Generating match explanations |
translation |
gpt-4o-mini |
Multilingual response translation |
guardrail_classifier |
gpt-4o-mini |
Input classification for safety |
Fallback Chains¶
If the primary model is unavailable, the system falls back through this chain:
fallback_chains:
default:
- gpt-4o
- claude-3-5-sonnet
- gpt-4o-mini
cost_sensitive:
- gpt-4o-mini
- claude-3-5-sonnet
accuracy_critical:
- gpt-4o
- claude-3-5-sonnet
Environment Variables¶
All environment variables are required unless marked as optional. Never commit actual values -- use .env.example as a template.
Database¶
| Variable | Description | Example Format |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | postgresql+asyncpg://user:pass@host:5432/curaway |
NEO4J_URI |
Neo4j Bolt connection URI | neo4j+s://xxxx.databases.neo4j.io |
NEO4J_USERNAME |
Neo4j authentication username | neo4j |
NEO4J_PASSWORD |
Neo4j authentication password | (secret) |
QDRANT_URL |
Qdrant vector database URL | https://xxxx.cloud.qdrant.io:6333 |
QDRANT_API_KEY |
Qdrant API key | (secret) |
UPSTASH_REDIS_URL |
Upstash Redis connection URL | rediss://default:xxxx@xxxx.upstash.io:6379 |
Authentication¶
| Variable | Description | Example Format |
|---|---|---|
CLERK_SECRET_KEY |
Clerk backend API secret key | sk_live_xxxx |
CLERK_PUBLISHABLE_KEY |
Clerk frontend publishable key (optional, frontend only) | pk_live_xxxx |
CLERK_WEBHOOK_SECRET |
Svix signing secret for /api/v1/webhooks/clerk. Configured in Clerk dashboard → Webhooks. |
whsec_xxxx |
Feature Management¶
| Variable | Description | Example Format |
|---|---|---|
FLAGSMITH_API_KEY |
Flagsmith environment API key (runtime SDK) | (secret) |
FLAGSMITH_API_URL |
Flagsmith runtime SDK URL (optional, defaults to cloud) | https://edge.api.flagsmith.com/api/v1/ |
FLAGSMITH_ADMIN_TOKEN |
Admin-scope API token used by /api/v1/admin/flags/* proxy. Distinct from the runtime SDK key — server-side only. |
Token xxxx |
FLAGSMITH_PROJECT_ID |
Numeric project id, visible in Flagsmith dashboard URL | 12345 |
FLAGSMITH_ENVIRONMENT_KEY |
Server-side environment key | ser.xxxx |
FLAGSMITH_ADMIN_API_URL |
Admin API URL (optional, defaults to cloud) | https://api.flagsmith.com/api/v1/ |
Tenancy¶
| Variable | Description | Example Format |
|---|---|---|
DEFAULT_TENANT_ID |
Tenant used when callers don't pass an explicit tenant_id (background jobs, seed data, dev shells). Defaults to tenant-apollo-001. |
tenant-apollo-001 |
Observability¶
| Variable | Description | Example Format |
|---|---|---|
LANGFUSE_SECRET_KEY |
Langfuse secret key for LLM tracing | sk-lf-xxxx |
LANGFUSE_PUBLIC_KEY |
Langfuse public key | pk-lf-xxxx |
LANGFUSE_HOST |
Langfuse host URL (optional) | https://cloud.langfuse.com |
Email & Notifications¶
| Variable | Description | Example Format |
|---|---|---|
RESEND_API_KEY |
Resend email API key | re_xxxx |
NOTIFICATION_FROM_EMAIL |
Sender email address (optional) | noreply@curaway.ai |
Storage (Cloudflare R2)¶
| Variable | Description | Example Format |
|---|---|---|
CLOUDFLARE_R2_ACCESS_KEY_ID |
R2 access key ID | (secret) |
CLOUDFLARE_R2_SECRET_ACCESS_KEY |
R2 secret access key | (secret) |
CLOUDFLARE_R2_BUCKET |
R2 bucket name | curaway-uploads |
CLOUDFLARE_R2_ENDPOINT |
R2 S3-compatible endpoint | https://xxxx.r2.cloudflarestorage.com |
CLOUDFLARE_R2_PUBLIC_URL |
Public URL prefix for uploaded files (optional) | https://uploads.curaway.ai |
AI / Embeddings¶
| Variable | Description | Example Format |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key for LLM and embeddings | sk-xxxx |
ANTHROPIC_API_KEY |
Anthropic API key (optional, for fallback) | sk-ant-xxxx |
EMBEDDING_PROVIDER |
Embedding provider (openai or local) |
openai |
EMBEDDING_MODEL |
Embedding model name (optional) | text-embedding-3-small |
Application¶
| Variable | Description | Example Format |
|---|---|---|
APP_ENV |
Environment name (development, staging, production) |
production |
AUTH_DISABLED |
Disable JWT auth for local development (optional) | true |
CORS_ORIGINS |
Comma-separated allowed CORS origins (optional) | https://app.curaway.ai,http://localhost:3000 |
LOG_LEVEL |
Logging level (optional, defaults to INFO) |
DEBUG |