Configuration Reference¶

This page documents all configuration files and environment variables used by the Curaway backend. Configuration is split between YAML files (checked into the repo) and environment variables (managed per deployment).

config/procedures.yaml¶

Single source of truth for the Procedure catalog and all clinical payload. (ADR-0027, PR #960)

Loaded by app/seeds/seed_procedures.py::_load_procedures_yaml(), validated via Pydantic (ProcedureYamlModel), and written to procedure_requirements via the upsert seeder.

The base templates (ORTHO_BASE, CARDIAC_BASE, ONCO_BASE, OPHTHALMOLOGY_BASE) remain hand-coded in seed_procedures.py because they are inheritance parents, not catalog procedures.

Fields per entry¶

Field	Type	Required	Description
`code`	str	yes	CPT code or custom code (`BMT-001`, `ONCO-CHEMO`). Pattern: `[A-Z0-9_-]{2,32}`
`name`	str	yes	Procedure display name
`specialty`	str	yes	`orthopedics` \| `cardiology` \| `oncology` \| `ophthalmology` \| `dental` \| `bariatric` \| `cosmetic` \| `fertility`
`neo4j_slug`	str	no	Neo4j node key (auto-derived from name if absent)
`category`	str	no	Postgres category column (inferred from specialty if absent)
`parent_procedure_code`	str	no	Inheritance parent (e.g. `ORTHO_BASE`). See inheritance algorithm below
`keywords`	list[str]	no	Intake keyword mapper phrases
`icd_links`	list[str]	no	ICD-10 codes (stored as `icd10_primary` in DB)
`snomed_primary`	list[str]	no	SNOMED CT codes
`required_documents`	list[DocumentItem]	no	Patient-facing document checklist. See DocumentItem shape below
`required_tests`	list[dict]	no	Hospital-side pre-op tests
`comorbidity_screening`	list[dict]	no	Intake screening questions
`contraindications`	list[dict]	no	Absolute / relative contraindications
`cost_range`	CostRange	no	`{min_usd, max_usd, us_average_usd}`
`recovery_timeline`	dict	no	`{hospital_days, local_recovery_days, full_recovery_weeks, ...}`
`travel_considerations`	dict	no	`{earliest_flight_post_op_days, companion_recommended, ...}`
`data_source`	str	no	Provenance marker. Must match `^(fabricated\\|naidu_approved)_<tag>_YYYY_MM_DD$`

DocumentItem shape¶

required_documents:
  - type: imaging          # blood_work | cardiac | imaging | anesthesia | clinical |
                           # ophthalmology | dental | psych | pulmonary | neurological |
                           # fertility | pathology | records | lab_work | clearance | obstetric
    name: "MRI Knee"       # Artefact descriptor — NOT a directive or diagnosis
    mandatory: true
    max_age_days: 180

pathology is a new type added in PR-1 for oncology documents (tumor reports, biopsy results, histology). It does not fit under imaging (no scan) or blood_work (not a lab value).

Inheritance algorithm (ADR-0027)¶

parent_procedure_code triggers shallow merge with list-concatenation: - List fields (required_documents, required_tests, comorbidity_screening, contraindications): parent items are prepended; child items appended. - Child [] (empty) explicitly overrides (parent items dropped). - Child key absent → full parent list inherited. - Scalar fields: child value wins; parent fills when child key is absent.

Seeder + backfill¶

# Seed all procedures (upsert)
python -m app.seeds.seed_procedures

# Backfill fabricated rows in prod
python -m scripts.backfill_procedure_clinical_payload --dry-run
python -m scripts.backfill_procedure_clinical_payload

config/diagnostic_tests/seed.yaml¶

Single source of truth for the DiagnosticTest catalog. Loaded by app/seeds/diagnostic_test_loader.py, validated via Pydantic (DiagnosticTestYamlModel), and projected to Neo4j by app/seeds/diagnostic_test_graph.py.

Fields per entry¶

Field	Type	Required	Description
`code`	str	yes	Unique slug used as Neo4j node key
`name`	str	yes	Human-readable test name
`category`	str	yes	`blood_work` \| `cardiac` \| `imaging` \| `pulmonary` \| `clinical_clearance` \| `infection_screen` \| `pathology`
`loinc`	str	no	LOINC code
`validity_days_default`	int	no	Fallback validity if procedure override is absent
`typical_cost_usd_cents`	int	no	Indicative cost in USD cents
`turnaround_hours`	int	no	Typical result turnaround
`data_source`	str	no	Provenance marker — present only for fabricated entries. Must match `^(fabricated\\|naidu_approved)_<tag>_YYYY_MM_DD$`

Ported from app/seed_graph.py::DIAGNOSTIC_TESTS (16 entries) + app/seed_procedure_tests.py::ADDITIONAL_DIAGNOSTIC_TESTS (16 entries, deduped to 15 net new) + 3 authored entries (echocardiogram, liver_function, kidney_function) tagged fabricated_pending_ops_2026_05_19. See PR-4 of the seeder unification roadmap (issue #960).

config/procedure_tests/seed.yaml¶

Single source of truth for Procedure → DiagnosticTest requirements. Loaded by app/seeds/procedure_test_loader.py, validated via Pydantic (ProcedureTestYamlModel), and projected to Neo4j as REQUIRES_TEST edges by app/seeds/procedure_test_graph.py.

Top-level structure¶

procedure_tests:
  <neo4j_slug>:
    - test_code: <code>
      validity_days: <int>
      is_mandatory: <bool>
      ...

Fields per test entry¶

Field	Type	Required	Default	Description
`test_code`	str	yes	—	Must match a `code` in `config/diagnostic_tests/seed.yaml`
`validity_days`	int	yes	—	Result age limit in days
`is_mandatory`	bool	no	`true`	Required vs. conditional/recommended
`priority`	str	no	derived from `is_mandatory`	`mandatory` \| `conditional` \| `recommended`
`timing_note`	str	no	`pre_travel`	`pre_travel` \| `on_arrival`
`source_acceptance`	str	no	`any_accredited`	`any_facility` \| `any_accredited` \| `jci_nabh_accredited` \| `hospital_only` \| `any_licensed_physician` \| `provider_lab_only`
`on_site_required`	bool	no	`false`	`true` if test must be performed at the treating hospital
`condition_note`	str	no	`null`	Clinical context for conditional/recommended tests
`notes`	str	no	`null`	Additional notes (stored as `source_note` on the REQUIRES_TEST relationship)

Ported from app/seed_graph.py::PROCEDURE_TESTS (2 procedures: knee_replacement, bypass_surgery) + app/seed_procedure_tests.py::PROCEDURE_TESTS (the remaining 15 procedures). See PR-4 of the seeder unification roadmap (issue #960).

config/providers/seed.yaml¶

Single source of truth for the Provider catalog. (ADR-0027, PR-5 / issue #960)

Loaded by app/seeds/provider_loader.py::load_providers(), validated via Pydantic (ProviderYamlModel), and written to the providers table via app/seeds/provider_seeder.py. Projected to Neo4j by app/seeds/provider_graph.py.

Ported from 4 legacy Python modules: - app/seed_providers.py — 42 base provider records - app/seed_storefront.py — tagline, cultural_support, travel_info, facility data (6 demo providers) - app/seed_storefront_enrichment.py — hero images, facility enrichment, technology lists, cultural/quality data - app/seed_full_enrichment.py — full enrichment for all 42 providers

All 4 source files are now deprecation shims that raise SystemExit.

Fields per entry¶

Field	Type	Required	Default	Description
`slug`	str	yes	—	Unique provider slug, used as Postgres PK and Neo4j graph ID
`name`	str	yes	—	Provider display name
`provider_type`	str	no	`hospital`	`hospital` \| `clinic` \| `surgical_provider` \| `recovery_provider` \| `transport_vendor`
`tenant_id`	str	no	`null`	Null for catalog entries; set at runtime by Clerk org activation flow
`country_code`	str	yes	—	ISO 3166-1 alpha-3 (3 chars, all-uppercase: `IND`, `TUR`, `THA`, …)
`city`	str	yes	—	City name
`description`	str	no	`null`	Institution description — must describe the provider, never give patient guidance (CLAUDE.md §9)
`logo_url`	str	no	`null`	Logo image URL
`website_url`	str	no	`null`	Provider website
`hero_image_url`	str	no	`null`	Hero/banner image URL (storefront)
`tagline`	str	no	`null`	Short tagline (storefront)
`address`	str	no	`null`	Street address
`latitude`	float	no	`null`	Latitude in range `[-90, 90]`
`longitude`	float	no	`null`	Longitude in range `[-180, 180]`
`timezone`	str	no	`UTC`	IANA timezone (e.g. `Asia/Kolkata`)
`specialties`	list[str]	yes	`[]`	Medical specialty slugs
`accreditations`	list[str]	no	`null`	`JCI`, `NABH`, `ISO_9001`, `KOIHA`, etc.
`languages_supported`	list[str]	yes	`[]`	ISO 639-1 language codes
`bed_count`	int	no	`null`	Total bed count
`annual_international_patients`	int	no	`null`	Annual international patient volume
`operating_theaters`	int	no	`null`	Number of operating theaters (storefront)
`icu_beds`	int	no	`null`	ICU bed count (storefront)
`countries_served`	int	no	`null`	Number of countries served (storefront)
`cost_index`	float	no	`1.0`	Provider-level cost multiplier (must be > 0). `1.0` = average, `0.35` = 65% below average
`procedure_costs`	dict	no	`null`	JSONB: `{<slug>: {min_usd_cents: int, max_usd_cents: int}}`
`outcome_score`	float	no	`null`	Composite outcome score `[0.0, 1.0]`
`patient_satisfaction`	float	no	`null`	Patient satisfaction rating `[0.0, 5.0]`
`total_reviews`	int	no	`0`	Number of verified reviews
`cultural_accommodations`	list[str]	no	`null`	e.g. `halal_food`, `prayer_room`, `female_staff_available`, `interpreter`
`dietary_options`	list[str]	no	`null`	e.g. `halal`, `kosher`, `vegetarian`, `vegan`
`cultural_support`	dict	no	`null`	Rich cultural support JSONB (storefront enrichment)
`travel_info`	dict	no	`null`	Airport, transfer, hotel, visa information (storefront)
`metadata`	dict	no	`{}`	FlexibleJSON envelope for additional provider metadata
`data_source`	str	no	`null`	Provenance marker. Must match `^(fabricated\\|naidu_approved)_<tag>_YYYY_MM_DD$`
`is_active`	bool	no	`true`	Whether the provider is active in the catalog

Seeder commands¶

# Seed all providers (upsert, idempotent)
python -m app.seeds._runner --only providers

# Or run the seeder directly
python -m app.seeds.provider_seeder

# Project to Neo4j
python -m app.seeds.provider_graph

# Dry-run (log what would change, no writes)
python -m app.seeds._runner --only providers --dry-run

Voice rules guardrail (CLAUDE.md §9)¶

Provider description fields must describe the institution, not give patient guidance.

✅ "Multi-specialty flagship hospital with JCI accreditation" — descriptive
❌ "Best hospital for your knee replacement" — patient directive

config/guardrails.yaml¶

The guardrails configuration controls the AI agent's input classification and output validation. It prevents the agent from soliciting or disclosing sensitive information and keeps conversations within the medical tourism intake scope.

Classifier Categories¶

The input classifier assigns each user message to one of these categories before the message reaches the LLM:

Category	Action	Description
`safe`	Allow	Normal intake-related messages (symptoms, travel questions, logistics)
`medical_advice`	Block	User is asking for a diagnosis or treatment recommendation
`pii_solicitation`	Block	User is providing unsolicited PII (SSN, credit card, etc.)
`emergency`	Escalate	User describes an emergency medical situation
`off_topic`	Redirect	Message is unrelated to medical tourism (e.g., sports, weather)
`profanity`	Block	Message contains profanity or abusive language
`legal_advice`	Block	User is asking for legal counsel (malpractice, liability)
`competitor_mention`	Log	User mentions a competing platform (logged but allowed)

Forbidden Output Patterns¶

The output validator scans every LLM response against these 12 regex patterns. If any pattern matches, the response is blocked and a safe fallback is returned instead.

Pattern ID	Description
`diagnosis_statement`	Statements like "you have [condition]" or "you are diagnosed with"
`dosage_recommendation`	Any mention of specific drug dosages
`treatment_prescription`	Direct treatment prescriptions ("you should take", "I recommend the drug")
`prognosis_prediction`	Survival rates, recovery time guarantees
`lab_interpretation`	Interpreting specific lab values as normal/abnormal
`emergency_self_treat`	Instructions for self-treating an emergency
`insurance_guarantee`	Guaranteeing insurance coverage or reimbursement
`price_guarantee`	Guaranteeing specific treatment costs
`legal_opinion`	Any legal advice or liability statements
`credential_leak`	API keys, passwords, or internal URLs in the response
`internal_instruction`	System prompt or internal instruction leakage
`competitor_endorsement`	Endorsing or recommending a competing platform

Response Templates¶

When a message is blocked, the system returns a pre-configured response template:

response_templates:
  medical_advice: >
    I understand your concern, but I'm not able to provide medical diagnoses
    or treatment recommendations. I can help connect you with qualified
    medical professionals who can properly evaluate your condition.
  emergency: >
    If you are experiencing a medical emergency, please call your local
    emergency services immediately. I can help you find appropriate care
    after your immediate needs are addressed.
  pii_solicitation: >
    For your safety, please do not share sensitive personal information
    like social security numbers or credit card details in this chat.
    Any required information will be collected securely during the
    formal intake process.
  off_topic: >
    I'm here to help with your medical travel needs. Could you tell me
    more about the medical procedure you're interested in or the
    destination you're considering?

Medical Keywords (Multilingual)¶

The classifier uses medical keyword lists in four languages to improve classification accuracy:

Language	Example Keywords	Count
English	surgery, diagnosis, prescription, symptoms, treatment	~120
Arabic	جراحة, تشخيص, وصفة طبية, أعراض, علاج	~90
German	Operation, Diagnose, Rezept, Symptome, Behandlung	~95
Turkish	ameliyat, teşhis, reçete, belirtiler, tedavi	~85

File Validation Limits¶

When guardrail file validation is enabled, uploaded files are checked against these limits:

Parameter	Value	Description
`max_file_size_mb`	20	Maximum file size in megabytes
`allowed_extensions`	`.pdf`, `.jpg`, `.jpeg`, `.png`, `.dcm`	Permitted file types
`max_filename_length`	255	Maximum filename character count
`scan_for_malware`	true	Whether to scan uploads with ClamAV (if available)
`ocr_timeout_seconds`	30	Maximum time for PyMuPDF OCR extraction

Workflow Gating Thresholds¶

Controls when the agent advances between workflow phases. Tune per corridor or procedure type.

Loaded by: app/services/gating_config.py → imported as GATING dict

EHR Gating¶

Parameter	Default	Description
`gating.ehr.completeness_for_matching`	`0.5`	Minimum EHR completeness (0.0–1.0) to offer provider matching
`gating.ehr.completeness_with_meds`	`0.4`	Lower threshold when medications have been confirmed (high-signal data)
`gating.ehr.matching_ready_threshold`	`0.4`	Minimum for `patient_state` to report `matching_ready: true`

Matching Gating¶

Parameter	Default	Description
`gating.matching.require_medications`	`true`	Whether medications must be asked/confirmed before matching

Document Matching¶

Parameter	Default	Description
`gating.documents.parameter_match_coverage`	`0.5`	Minimum fraction of expected parameters to count as a match (e.g., 4/7 CBC params)

Intake Gating¶

Parameter	Default	Description
`gating.intake.min_answers_for_completion`	`3`	Minimum substantive patient answers before intake can advance

Tuning for demo vs. production

For investor demos, lower completeness_for_matching to 0.3 for faster flow. For production, keep at 0.5 or higher for better match quality.

config/model_registry.yaml¶

The model registry defines which LLM models are available, their capabilities, and how they are assigned to different tasks.

Model Definitions¶

models:
  gpt-4o:
    provider: openai
    context_window: 128000
    max_output_tokens: 4096
    supports_vision: true
    supports_function_calling: true
    cost_per_1k_input: 0.005
    cost_per_1k_output: 0.015

  gpt-4o-mini:
    provider: openai
    context_window: 128000
    max_output_tokens: 4096
    supports_vision: true
    supports_function_calling: true
    cost_per_1k_input: 0.00015
    cost_per_1k_output: 0.0006

  claude-3-5-sonnet:
    provider: anthropic
    context_window: 200000
    max_output_tokens: 8192
    supports_vision: true
    supports_function_calling: true
    cost_per_1k_input: 0.003
    cost_per_1k_output: 0.015

Tier Assignments¶

Models are assigned to tiers that map to specific use cases:

Tier	Model	Use Case
`intake_chat`	`gpt-4o-mini`	Patient intake conversations (cost-optimized)
`medical_summary`	`gpt-4o`	EHR summary generation (accuracy-critical)
`document_analysis`	`gpt-4o`	OCR result analysis and extraction
`provider_matching`	`gpt-4o-mini`	Generating match explanations
`translation`	`gpt-4o-mini`	Multilingual response translation
`guardrail_classifier`	`gpt-4o-mini`	Input classification for safety

Fallback Chains¶

If the primary model is unavailable, the system falls back through this chain:

fallback_chains:
  default:
    - gpt-4o
    - claude-3-5-sonnet
    - gpt-4o-mini
  cost_sensitive:
    - gpt-4o-mini
    - claude-3-5-sonnet
  accuracy_critical:
    - gpt-4o
    - claude-3-5-sonnet

Environment Variables¶

All environment variables are required unless marked as optional. Never commit actual values -- use .env.example as a template.

Database¶

Variable	Description	Example Format
`DATABASE_URL`	PostgreSQL connection string	`postgresql+asyncpg://user:pass@host:5432/curaway`
`NEO4J_URI`	Neo4j Bolt connection URI	`neo4j+s://xxxx.databases.neo4j.io`
`NEO4J_USERNAME`	Neo4j authentication username	`neo4j`
`NEO4J_PASSWORD`	Neo4j authentication password	`(secret)`
`QDRANT_URL`	Qdrant vector database URL	`https://xxxx.cloud.qdrant.io:6333`
`QDRANT_API_KEY`	Qdrant API key	`(secret)`
`UPSTASH_REDIS_URL`	Upstash Redis connection URL	`rediss://default:xxxx@xxxx.upstash.io:6379`

Authentication¶

Variable	Description	Example Format
`CLERK_SECRET_KEY`	Clerk backend API secret key	`sk_live_xxxx`
`CLERK_PUBLISHABLE_KEY`	Clerk frontend publishable key (optional, frontend only)	`pk_live_xxxx`
`CLERK_WEBHOOK_SECRET`	Svix signing secret for `/api/v1/webhooks/clerk`. Configured in Clerk dashboard → Webhooks.	`whsec_xxxx`

Feature Management¶

Variable	Description	Example Format
`FLAGSMITH_API_KEY`	Flagsmith environment API key (runtime SDK)	`(secret)`
`FLAGSMITH_API_URL`	Flagsmith runtime SDK URL (optional, defaults to cloud)	`https://edge.api.flagsmith.com/api/v1/`
`FLAGSMITH_ADMIN_TOKEN`	Admin-scope API token used by `/api/v1/admin/flags/*` proxy. Distinct from the runtime SDK key — server-side only.	`Token xxxx`
`FLAGSMITH_PROJECT_ID`	Numeric project id, visible in Flagsmith dashboard URL	`12345`
`FLAGSMITH_ENVIRONMENT_KEY`	Server-side environment key	`ser.xxxx`
`FLAGSMITH_ADMIN_API_URL`	Admin API URL (optional, defaults to cloud)	`https://api.flagsmith.com/api/v1/`

Tenancy¶

Variable	Description	Example Format
`DEFAULT_TENANT_ID`	Tenant used when callers don't pass an explicit tenant_id (background jobs, seed data, dev shells). Defaults to `tenant-apollo-001`.	`tenant-apollo-001`

Observability¶

Variable	Description	Example Format
`LANGFUSE_SECRET_KEY`	Langfuse secret key for LLM tracing	`sk-lf-xxxx`
`LANGFUSE_PUBLIC_KEY`	Langfuse public key	`pk-lf-xxxx`
`LANGFUSE_HOST`	Langfuse host URL (optional)	`https://cloud.langfuse.com`

Email & Notifications¶

Variable	Description	Example Format
`RESEND_API_KEY`	Resend email API key	`re_xxxx`
`NOTIFICATION_FROM_EMAIL`	Sender email address (optional)	`noreply@curaway.ai`

Storage (Cloudflare R2)¶

Variable	Description	Example Format
`CLOUDFLARE_R2_ACCESS_KEY_ID`	R2 access key ID	`(secret)`
`CLOUDFLARE_R2_SECRET_ACCESS_KEY`	R2 secret access key	`(secret)`
`CLOUDFLARE_R2_BUCKET`	R2 bucket name	`curaway-uploads`
`CLOUDFLARE_R2_ENDPOINT`	R2 S3-compatible endpoint	`https://xxxx.r2.cloudflarestorage.com`
`CLOUDFLARE_R2_PUBLIC_URL`	Public URL prefix for uploaded files (optional)	`https://uploads.curaway.ai`

AI / Embeddings¶

Variable	Description	Example Format
`OPENAI_API_KEY`	OpenAI API key for LLM and embeddings	`sk-xxxx`
`ANTHROPIC_API_KEY`	Anthropic API key (optional, for fallback)	`sk-ant-xxxx`
`EMBEDDING_PROVIDER`	Embedding provider (`openai` or `local`)	`openai`
`EMBEDDING_MODEL`	Embedding model name (optional)	`text-embedding-3-small`

Application¶

Variable	Description	Example Format
`APP_ENV`	Environment name (`development`, `staging`, `production`)	`production`
`AUTH_DISABLED`	Disable JWT auth for local development (optional)	`true`
`CORS_ORIGINS`	Comma-separated allowed CORS origins (optional)	`https://app.curaway.ai,http://localhost:3000`
`LOG_LEVEL`	Logging level (optional, defaults to `INFO`)	`DEBUG`