Skip to content

Configuration Reference

This page documents all configuration files and environment variables used by the Curaway backend. Configuration is split between YAML files (checked into the repo) and environment variables (managed per deployment).


config/procedures.yaml

Single source of truth for the Procedure catalog and all clinical payload. (ADR-0027, PR #960)

Loaded by app/seeds/seed_procedures.py::_load_procedures_yaml(), validated via Pydantic (ProcedureYamlModel), and written to procedure_requirements via the upsert seeder.

The base templates (ORTHO_BASE, CARDIAC_BASE, ONCO_BASE, OPHTHALMOLOGY_BASE) remain hand-coded in seed_procedures.py because they are inheritance parents, not catalog procedures.

Fields per entry

Field Type Required Description
code str yes CPT code or custom code (BMT-001, ONCO-CHEMO). Pattern: [A-Z0-9_-]{2,32}
name str yes Procedure display name
specialty str yes orthopedics | cardiology | oncology | ophthalmology | dental | bariatric | cosmetic | fertility
neo4j_slug str no Neo4j node key (auto-derived from name if absent)
category str no Postgres category column (inferred from specialty if absent)
parent_procedure_code str no Inheritance parent (e.g. ORTHO_BASE). See inheritance algorithm below
keywords list[str] no Intake keyword mapper phrases
icd_links list[str] no ICD-10 codes (stored as icd10_primary in DB)
snomed_primary list[str] no SNOMED CT codes
required_documents list[DocumentItem] no Patient-facing document checklist. See DocumentItem shape below
required_tests list[dict] no Hospital-side pre-op tests
comorbidity_screening list[dict] no Intake screening questions
contraindications list[dict] no Absolute / relative contraindications
cost_range CostRange no {min_usd, max_usd, us_average_usd}
recovery_timeline dict no {hospital_days, local_recovery_days, full_recovery_weeks, ...}
travel_considerations dict no {earliest_flight_post_op_days, companion_recommended, ...}
data_source str no Provenance marker. Must match ^(fabricated\|naidu_approved)_<tag>_YYYY_MM_DD$

DocumentItem shape

required_documents:
  - type: imaging          # blood_work | cardiac | imaging | anesthesia | clinical |
                           # ophthalmology | dental | psych | pulmonary | neurological |
                           # fertility | pathology | records | lab_work | clearance | obstetric
    name: "MRI Knee"       # Artefact descriptor — NOT a directive or diagnosis
    mandatory: true
    max_age_days: 180

pathology is a new type added in PR-1 for oncology documents (tumor reports, biopsy results, histology). It does not fit under imaging (no scan) or blood_work (not a lab value).

Inheritance algorithm (ADR-0027)

parent_procedure_code triggers shallow merge with list-concatenation: - List fields (required_documents, required_tests, comorbidity_screening, contraindications): parent items are prepended; child items appended. - Child [] (empty) explicitly overrides (parent items dropped). - Child key absent → full parent list inherited. - Scalar fields: child value wins; parent fills when child key is absent.

Seeder + backfill

# Seed all procedures (upsert)
python -m app.seeds.seed_procedures

# Backfill fabricated rows in prod
python -m scripts.backfill_procedure_clinical_payload --dry-run
python -m scripts.backfill_procedure_clinical_payload

config/diagnostic_tests/seed.yaml

Single source of truth for the DiagnosticTest catalog. Loaded by app/seeds/diagnostic_test_loader.py, validated via Pydantic (DiagnosticTestYamlModel), and projected to Neo4j by app/seeds/diagnostic_test_graph.py.

Fields per entry

Field Type Required Description
code str yes Unique slug used as Neo4j node key
name str yes Human-readable test name
category str yes blood_work | cardiac | imaging | pulmonary | clinical_clearance | infection_screen | pathology
loinc str no LOINC code
validity_days_default int no Fallback validity if procedure override is absent
typical_cost_usd_cents int no Indicative cost in USD cents
turnaround_hours int no Typical result turnaround
data_source str no Provenance marker — present only for fabricated entries. Must match ^(fabricated\|naidu_approved)_<tag>_YYYY_MM_DD$

Ported from app/seed_graph.py::DIAGNOSTIC_TESTS (16 entries) + app/seed_procedure_tests.py::ADDITIONAL_DIAGNOSTIC_TESTS (16 entries, deduped to 15 net new) + 3 authored entries (echocardiogram, liver_function, kidney_function) tagged fabricated_pending_ops_2026_05_19. See PR-4 of the seeder unification roadmap (issue #960).


config/procedure_tests/seed.yaml

Single source of truth for Procedure → DiagnosticTest requirements. Loaded by app/seeds/procedure_test_loader.py, validated via Pydantic (ProcedureTestYamlModel), and projected to Neo4j as REQUIRES_TEST edges by app/seeds/procedure_test_graph.py.

Top-level structure

procedure_tests:
  <neo4j_slug>:
    - test_code: <code>
      validity_days: <int>
      is_mandatory: <bool>
      ...

Fields per test entry

Field Type Required Default Description
test_code str yes Must match a code in config/diagnostic_tests/seed.yaml
validity_days int yes Result age limit in days
is_mandatory bool no true Required vs. conditional/recommended
priority str no derived from is_mandatory mandatory | conditional | recommended
timing_note str no pre_travel pre_travel | on_arrival
source_acceptance str no any_accredited any_facility | any_accredited | jci_nabh_accredited | hospital_only | any_licensed_physician | provider_lab_only
on_site_required bool no false true if test must be performed at the treating hospital
condition_note str no null Clinical context for conditional/recommended tests
notes str no null Additional notes (stored as source_note on the REQUIRES_TEST relationship)

Ported from app/seed_graph.py::PROCEDURE_TESTS (2 procedures: knee_replacement, bypass_surgery) + app/seed_procedure_tests.py::PROCEDURE_TESTS (the remaining 15 procedures). See PR-4 of the seeder unification roadmap (issue #960).


config/providers/seed.yaml

Single source of truth for the Provider catalog. (ADR-0027, PR-5 / issue #960)

Loaded by app/seeds/provider_loader.py::load_providers(), validated via Pydantic (ProviderYamlModel), and written to the providers table via app/seeds/provider_seeder.py. Projected to Neo4j by app/seeds/provider_graph.py.

Ported from 4 legacy Python modules: - app/seed_providers.py — 42 base provider records - app/seed_storefront.py — tagline, cultural_support, travel_info, facility data (6 demo providers) - app/seed_storefront_enrichment.py — hero images, facility enrichment, technology lists, cultural/quality data - app/seed_full_enrichment.py — full enrichment for all 42 providers

All 4 source files are now deprecation shims that raise SystemExit.

Fields per entry

Field Type Required Default Description
slug str yes Unique provider slug, used as Postgres PK and Neo4j graph ID
name str yes Provider display name
provider_type str no hospital hospital | clinic | surgical_provider | recovery_provider | transport_vendor
tenant_id str no null Null for catalog entries; set at runtime by Clerk org activation flow
country_code str yes ISO 3166-1 alpha-3 (3 chars, all-uppercase: IND, TUR, THA, …)
city str yes City name
description str no null Institution description — must describe the provider, never give patient guidance (CLAUDE.md §9)
logo_url str no null Logo image URL
website_url str no null Provider website
hero_image_url str no null Hero/banner image URL (storefront)
tagline str no null Short tagline (storefront)
address str no null Street address
latitude float no null Latitude in range [-90, 90]
longitude float no null Longitude in range [-180, 180]
timezone str no UTC IANA timezone (e.g. Asia/Kolkata)
specialties list[str] yes [] Medical specialty slugs
accreditations list[str] no null JCI, NABH, ISO_9001, KOIHA, etc.
languages_supported list[str] yes [] ISO 639-1 language codes
bed_count int no null Total bed count
annual_international_patients int no null Annual international patient volume
operating_theaters int no null Number of operating theaters (storefront)
icu_beds int no null ICU bed count (storefront)
countries_served int no null Number of countries served (storefront)
cost_index float no 1.0 Provider-level cost multiplier (must be > 0). 1.0 = average, 0.35 = 65% below average
procedure_costs dict no null JSONB: {<slug>: {min_usd_cents: int, max_usd_cents: int}}
outcome_score float no null Composite outcome score [0.0, 1.0]
patient_satisfaction float no null Patient satisfaction rating [0.0, 5.0]
total_reviews int no 0 Number of verified reviews
cultural_accommodations list[str] no null e.g. halal_food, prayer_room, female_staff_available, interpreter
dietary_options list[str] no null e.g. halal, kosher, vegetarian, vegan
cultural_support dict no null Rich cultural support JSONB (storefront enrichment)
travel_info dict no null Airport, transfer, hotel, visa information (storefront)
metadata dict no {} FlexibleJSON envelope for additional provider metadata
data_source str no null Provenance marker. Must match ^(fabricated\|naidu_approved)_<tag>_YYYY_MM_DD$
is_active bool no true Whether the provider is active in the catalog

Seeder commands

# Seed all providers (upsert, idempotent)
python -m app.seeds._runner --only providers

# Or run the seeder directly
python -m app.seeds.provider_seeder

# Project to Neo4j
python -m app.seeds.provider_graph

# Dry-run (log what would change, no writes)
python -m app.seeds._runner --only providers --dry-run

Voice rules guardrail (CLAUDE.md §9)

Provider description fields must describe the institution, not give patient guidance.

  • "Multi-specialty flagship hospital with JCI accreditation" — descriptive
  • "Best hospital for your knee replacement" — patient directive

config/guardrails.yaml

The guardrails configuration controls the AI agent's input classification and output validation. It prevents the agent from soliciting or disclosing sensitive information and keeps conversations within the medical tourism intake scope.

Classifier Categories

The input classifier assigns each user message to one of these categories before the message reaches the LLM:

Category Action Description
safe Allow Normal intake-related messages (symptoms, travel questions, logistics)
medical_advice Block User is asking for a diagnosis or treatment recommendation
pii_solicitation Block User is providing unsolicited PII (SSN, credit card, etc.)
emergency Escalate User describes an emergency medical situation
off_topic Redirect Message is unrelated to medical tourism (e.g., sports, weather)
profanity Block Message contains profanity or abusive language
legal_advice Block User is asking for legal counsel (malpractice, liability)
competitor_mention Log User mentions a competing platform (logged but allowed)

Forbidden Output Patterns

The output validator scans every LLM response against these 12 regex patterns. If any pattern matches, the response is blocked and a safe fallback is returned instead.

Pattern ID Description
diagnosis_statement Statements like "you have [condition]" or "you are diagnosed with"
dosage_recommendation Any mention of specific drug dosages
treatment_prescription Direct treatment prescriptions ("you should take", "I recommend the drug")
prognosis_prediction Survival rates, recovery time guarantees
lab_interpretation Interpreting specific lab values as normal/abnormal
emergency_self_treat Instructions for self-treating an emergency
insurance_guarantee Guaranteeing insurance coverage or reimbursement
price_guarantee Guaranteeing specific treatment costs
legal_opinion Any legal advice or liability statements
credential_leak API keys, passwords, or internal URLs in the response
internal_instruction System prompt or internal instruction leakage
competitor_endorsement Endorsing or recommending a competing platform

Response Templates

When a message is blocked, the system returns a pre-configured response template:

response_templates:
  medical_advice: >
    I understand your concern, but I'm not able to provide medical diagnoses
    or treatment recommendations. I can help connect you with qualified
    medical professionals who can properly evaluate your condition.
  emergency: >
    If you are experiencing a medical emergency, please call your local
    emergency services immediately. I can help you find appropriate care
    after your immediate needs are addressed.
  pii_solicitation: >
    For your safety, please do not share sensitive personal information
    like social security numbers or credit card details in this chat.
    Any required information will be collected securely during the
    formal intake process.
  off_topic: >
    I'm here to help with your medical travel needs. Could you tell me
    more about the medical procedure you're interested in or the
    destination you're considering?

Medical Keywords (Multilingual)

The classifier uses medical keyword lists in four languages to improve classification accuracy:

Language Example Keywords Count
English surgery, diagnosis, prescription, symptoms, treatment ~120
Arabic جراحة, تشخيص, وصفة طبية, أعراض, علاج ~90
German Operation, Diagnose, Rezept, Symptome, Behandlung ~95
Turkish ameliyat, teşhis, reçete, belirtiler, tedavi ~85

File Validation Limits

When guardrail file validation is enabled, uploaded files are checked against these limits:

Parameter Value Description
max_file_size_mb 20 Maximum file size in megabytes
allowed_extensions .pdf, .jpg, .jpeg, .png, .dcm Permitted file types
max_filename_length 255 Maximum filename character count
scan_for_malware true Whether to scan uploads with ClamAV (if available)
ocr_timeout_seconds 30 Maximum time for PyMuPDF OCR extraction

Workflow Gating Thresholds

Controls when the agent advances between workflow phases. Tune per corridor or procedure type.

Loaded by: app/services/gating_config.py → imported as GATING dict

EHR Gating

Parameter Default Description
gating.ehr.completeness_for_matching 0.5 Minimum EHR completeness (0.0–1.0) to offer provider matching
gating.ehr.completeness_with_meds 0.4 Lower threshold when medications have been confirmed (high-signal data)
gating.ehr.matching_ready_threshold 0.4 Minimum for patient_state to report matching_ready: true

Matching Gating

Parameter Default Description
gating.matching.require_medications true Whether medications must be asked/confirmed before matching

Document Matching

Parameter Default Description
gating.documents.parameter_match_coverage 0.5 Minimum fraction of expected parameters to count as a match (e.g., 4/7 CBC params)

Intake Gating

Parameter Default Description
gating.intake.min_answers_for_completion 3 Minimum substantive patient answers before intake can advance

Tuning for demo vs. production

For investor demos, lower completeness_for_matching to 0.3 for faster flow. For production, keep at 0.5 or higher for better match quality.


config/model_registry.yaml

The model registry defines which LLM models are available, their capabilities, and how they are assigned to different tasks.

Model Definitions

models:
  gpt-4o:
    provider: openai
    context_window: 128000
    max_output_tokens: 4096
    supports_vision: true
    supports_function_calling: true
    cost_per_1k_input: 0.005
    cost_per_1k_output: 0.015

  gpt-4o-mini:
    provider: openai
    context_window: 128000
    max_output_tokens: 4096
    supports_vision: true
    supports_function_calling: true
    cost_per_1k_input: 0.00015
    cost_per_1k_output: 0.0006

  claude-3-5-sonnet:
    provider: anthropic
    context_window: 200000
    max_output_tokens: 8192
    supports_vision: true
    supports_function_calling: true
    cost_per_1k_input: 0.003
    cost_per_1k_output: 0.015

Tier Assignments

Models are assigned to tiers that map to specific use cases:

Tier Model Use Case
intake_chat gpt-4o-mini Patient intake conversations (cost-optimized)
medical_summary gpt-4o EHR summary generation (accuracy-critical)
document_analysis gpt-4o OCR result analysis and extraction
provider_matching gpt-4o-mini Generating match explanations
translation gpt-4o-mini Multilingual response translation
guardrail_classifier gpt-4o-mini Input classification for safety

Fallback Chains

If the primary model is unavailable, the system falls back through this chain:

fallback_chains:
  default:
    - gpt-4o
    - claude-3-5-sonnet
    - gpt-4o-mini
  cost_sensitive:
    - gpt-4o-mini
    - claude-3-5-sonnet
  accuracy_critical:
    - gpt-4o
    - claude-3-5-sonnet

Environment Variables

All environment variables are required unless marked as optional. Never commit actual values -- use .env.example as a template.

Database

Variable Description Example Format
DATABASE_URL PostgreSQL connection string postgresql+asyncpg://user:pass@host:5432/curaway
NEO4J_URI Neo4j Bolt connection URI neo4j+s://xxxx.databases.neo4j.io
NEO4J_USERNAME Neo4j authentication username neo4j
NEO4J_PASSWORD Neo4j authentication password (secret)
QDRANT_URL Qdrant vector database URL https://xxxx.cloud.qdrant.io:6333
QDRANT_API_KEY Qdrant API key (secret)
UPSTASH_REDIS_URL Upstash Redis connection URL rediss://default:xxxx@xxxx.upstash.io:6379

Authentication

Variable Description Example Format
CLERK_SECRET_KEY Clerk backend API secret key sk_live_xxxx
CLERK_PUBLISHABLE_KEY Clerk frontend publishable key (optional, frontend only) pk_live_xxxx
CLERK_WEBHOOK_SECRET Svix signing secret for /api/v1/webhooks/clerk. Configured in Clerk dashboard → Webhooks. whsec_xxxx

Feature Management

Variable Description Example Format
FLAGSMITH_API_KEY Flagsmith environment API key (runtime SDK) (secret)
FLAGSMITH_API_URL Flagsmith runtime SDK URL (optional, defaults to cloud) https://edge.api.flagsmith.com/api/v1/
FLAGSMITH_ADMIN_TOKEN Admin-scope API token used by /api/v1/admin/flags/* proxy. Distinct from the runtime SDK key — server-side only. Token xxxx
FLAGSMITH_PROJECT_ID Numeric project id, visible in Flagsmith dashboard URL 12345
FLAGSMITH_ENVIRONMENT_KEY Server-side environment key ser.xxxx
FLAGSMITH_ADMIN_API_URL Admin API URL (optional, defaults to cloud) https://api.flagsmith.com/api/v1/

Tenancy

Variable Description Example Format
DEFAULT_TENANT_ID Tenant used when callers don't pass an explicit tenant_id (background jobs, seed data, dev shells). Defaults to tenant-apollo-001. tenant-apollo-001

Observability

Variable Description Example Format
LANGFUSE_SECRET_KEY Langfuse secret key for LLM tracing sk-lf-xxxx
LANGFUSE_PUBLIC_KEY Langfuse public key pk-lf-xxxx
LANGFUSE_HOST Langfuse host URL (optional) https://cloud.langfuse.com

Email & Notifications

Variable Description Example Format
RESEND_API_KEY Resend email API key re_xxxx
NOTIFICATION_FROM_EMAIL Sender email address (optional) noreply@curaway.ai

Storage (Cloudflare R2)

Variable Description Example Format
CLOUDFLARE_R2_ACCESS_KEY_ID R2 access key ID (secret)
CLOUDFLARE_R2_SECRET_ACCESS_KEY R2 secret access key (secret)
CLOUDFLARE_R2_BUCKET R2 bucket name curaway-uploads
CLOUDFLARE_R2_ENDPOINT R2 S3-compatible endpoint https://xxxx.r2.cloudflarestorage.com
CLOUDFLARE_R2_PUBLIC_URL Public URL prefix for uploaded files (optional) https://uploads.curaway.ai

AI / Embeddings

Variable Description Example Format
OPENAI_API_KEY OpenAI API key for LLM and embeddings sk-xxxx
ANTHROPIC_API_KEY Anthropic API key (optional, for fallback) sk-ant-xxxx
EMBEDDING_PROVIDER Embedding provider (openai or local) openai
EMBEDDING_MODEL Embedding model name (optional) text-embedding-3-small

Application

Variable Description Example Format
APP_ENV Environment name (development, staging, production) production
AUTH_DISABLED Disable JWT auth for local development (optional) true
CORS_ORIGINS Comma-separated allowed CORS origins (optional) https://app.curaway.ai,http://localhost:3000
LOG_LEVEL Logging level (optional, defaults to INFO) DEBUG