Skip to content

Seed Data

This runbook documents the full seed pipeline for populating Curaway's databases with initial data. Seeds must be run in a specific order because later seeds depend on records created by earlier ones.


Seed Order

Run the seeds in this exact sequence:

# 1. Base seed: tenant + consent records
python -m app.seed

# 2. Providers: 42 providers across 8 countries
python -m app.seed_providers

# 3. Demo data: demo patient Aisha + sample cases
python -m app.seed_demo

# 4. Knowledge graph: Neo4j provider-procedure-country graph
python -m app.seed_graph

# 5. Embeddings: Qdrant vector embeddings for semantic search
python -m app.seed_embeddings

# 6. Doctors: 8 doctor profiles + language services
python -m app.seed_doctors

# 7. Storefront: treatment categories, provider content, completeness scores
python -m app.seed_storefront

Or as a single command:

python -m app.seed && \
python -m app.seed_providers && \
python -m app.seed_demo && \
python -m app.seed_graph && \
python -m app.seed_embeddings && \
python -m app.seed_doctors && \
python -m app.seed_storefront

Seed Details

1. python -m app.seed -- Base Seed

Creates the foundational records that all other seeds depend on.

Creates:

Record Details
Tenant tenant-apollo-001 -- the default development/demo tenant
Consent types GDPR consent categories: treatment, data_sharing, marketing, research
Consent templates Default consent text templates in English, Arabic, German, and Turkish

Idempotent: Yes -- skips records that already exist.

Dependencies: None. This must run first.

2. python -m app.seed_providers -- Provider Seed

Populates the providers table with 42 medical providers across 8 countries.

Creates:

Country Code Provider Count Specialties
Germany DE 8 Orthopedics, cardiology, oncology, neurosurgery
Turkey TR 7 Dental, ophthalmology, cosmetic surgery, cardiac
India IN 6 Cardiac surgery, orthopedics, oncology, fertility
Thailand TH 5 Cosmetic surgery, dental, orthopedics
South Korea KR 4 Cosmetic surgery, ophthalmology, dermatology
Spain ES 4 Oncology, orthopedics, fertility
Mexico MX 4 Dental, bariatric surgery, cosmetic surgery
UAE AE 4 Cardiac, orthopedics, fertility, oncology

Each provider record includes:

  • Name, country, city
  • Specialties (array)
  • Accreditations (JCI, ISO 9001, etc.)
  • Description (used for embedding generation)
  • Accepted insurance networks
  • Languages spoken at the facility
  • Average procedure costs (range)

Idempotent: Yes -- uses upsert by provider ID.

Dependencies: Requires app.seed (needs tenant record).

3. python -m app.seed_demo -- Demo Data Seed (legacy — prefer scripts/seed_e2e.py)

Deprecated for test runs (#1194 E2): the patient this script creates (demo-patient-aisha-001) is quarantined. Five unrelated personae and 64 cases were stacked on the single demographic record, which poisons the matcher (cross-case FHIR contamination). For test journeys, use the clean per-persona accounts seeded by scripts/seed_e2e.py [--persona maria|abdul|meskerem] instead. See docs/runbook/test-data-hygiene.md.

Creates a demo patient and sample cases for testing and demonstrations.

Creates:

Record Details
Patient Aisha Patel -- female, DOB 1985-06-15, UAE resident, email aisha.patel@example.com (quarantined — see note above)
Case 1 Hip replacement, status in_review, assigned to Berlin Orthopedic Center
Case 2 Dental implants, status draft, no provider assigned yet
Case 3 Cardiac consultation, status completed, assigned to Munich Heart Clinic
Documents Sample medical report and MRI document references for Case 1
FHIR resources Patient, Condition (osteoarthritis M16.1), Procedure (hip replacement), Observation (hemoglobin)
Chat history 5 sample chat messages for Case 1 showing intake conversation flow

Idempotent: Yes -- checks for existing demo patient by email before creating.

Dependencies: Requires app.seed and app.seed_providers.

3a. python scripts/seed_e2e.py -- Clean Per-Persona E2E Seed (preferred)

Replaces the polluted app.seed_demo for journey testing. Each persona gets its own demographic record and its own Case, so the matcher (when passed case_id) sees a single-case FHIR view.

Personas:

ID Details
demo-patient-maria-001 Maria, 45F, Dubai, stage 2 IDC left breast
demo-patient-abdul-001 Abdul Moeed, male, Jeddah, leukaemia
demo-patient-meskerem-001 Meskerem, 38F, Ethiopia, absent iliac vein

Idempotent: Yes (dedup on external_auth_id).

See app/seeds/seed_persona_accounts.py for the seed functions and tests/test_persona_accounts_seed.py for guarantees.

4. python -m app.seed_graph -- Knowledge Graph Seed

Populates the Neo4j knowledge graph with provider-procedure-country relationships.

Creates:

Node Type Count Properties
Provider 42 id, name, country, specialties, accreditations
Procedure ~30 name, category, typical_duration, complexity
Country 8 code, name, timezone, currency
Relationship Description Count
PERFORMS Provider performs a procedure ~120
LOCATED_IN Provider is located in a country 42
AVAILABLE_IN Procedure is available in a country ~60
SPECIALIZES_IN Provider specializes in a medical field ~80

Idempotent: Yes -- uses MERGE in Cypher to avoid duplicates.

Dependencies: Requires app.seed_providers (reads provider data from PostgreSQL).

5. python -m app.seed_embeddings -- Vector Embedding Seed

Generates text embeddings for all providers and upserts them into Qdrant.

Creates:

Collection Points Dimension Distance
providers 42 1536 Cosine

Each point includes:

  • Vector: Embedding of the provider's description + specialties + procedures
  • Payload: provider_id, name, country, specialties, accreditations

Cost: ~42 API calls to OpenAI embedding endpoint (~$0.01 total).

Idempotent: Yes -- upserts by provider ID.

Dependencies: Requires app.seed_providers and a valid OPENAI_API_KEY.

6. python -m app.seed_doctors -- Doctor Seed

Creates doctor profiles and their procedure associations.

Creates:

Doctor Specialty Country Languages Provider
Dr. Hans Mueller Orthopedic Surgery DE German, English Berlin Orthopedic Center
Dr. Fatma Yilmaz Cardiac Surgery TR Turkish, English, Arabic Istanbul Heart Center
Dr. Raj Sharma Oncology IN Hindi, English Mumbai Cancer Institute
Dr. Suki Tanaka Cosmetic Surgery TH Thai, English, Japanese Bangkok Aesthetic Clinic
Dr. Min-Jun Park Ophthalmology KR Korean, English Seoul Eye Center
Dr. Elena Rodriguez Fertility ES Spanish, English, French Barcelona Fertility Clinic
Dr. Carlos Mendez Bariatric Surgery MX Spanish, English Monterrey Weight Loss Center
Dr. Amira Hassan Cardiology AE Arabic, English, French Dubai Heart Hospital

Each doctor record includes:

  • Full name, specialty, license number
  • Languages spoken
  • Years of experience
  • Associated provider (foreign key)
  • Procedures performed (many-to-many)
  • Education and certifications

Additionally seeds language services for each provider based on the languages spoken by their doctors.

Idempotent: Yes -- checks for existing doctors by license number.

Dependencies: Requires app.seed_providers.

7. python -m app.seed_storefront -- Storefront Seed

Seeds 12 treatment categories, updates 6 demo providers with storefront content (taglines, facilities, cultural support, travel info), and computes completeness scores for all 42 providers.

Creates:

Record Details
Treatment categories 12 categories (knee replacement, CABG, IVF, dental implants, etc.)
Provider storefront content Taglines, facility lists, cultural support, travel info for 6 demo providers
Completeness scores Computed for all 42 providers

Idempotent: Yes -- upserts by slug/ID.

Dependencies: Requires app.seed_providers and app.seed_doctors.

8. python -m app.seed_full_enrichment -- Full Provider Enrichment

Enriches ALL 42 providers with storefront data: taglines, hero images, technology lists, cultural support, quality enrichment. Updates all 12 treatments with descriptions, what_to_expect, recovery_info. Adds bios to all 8 original doctors. Recomputes completeness scores (all 42 → premium tier).

Dependencies: Requires app.seed_storefront.

9. python -m app.seed_doctors_full -- Full Doctor Seeding

Seeds 83 additional doctors across all 42 providers (2-3 per provider). Country-appropriate names, qualifications, specialties, photos, bios. Creates DoctorProcedure entries and syncs to Neo4j.

Dependencies: Requires app.seed_providers.

10. python -m app.seed_doctor_photos -- Unique Doctor Photos

Assigns unique Unsplash portrait URLs to all 91 doctors (47 male, 44 female). Uses sig parameter for URL uniqueness.

Dependencies: Requires app.seed_doctors_full.

11. python -m app.seed_procedure_tests -- Procedure Test Requirements

Seeds REQUIRES_TEST relationships in Neo4j for 15 additional procedures (137 test mappings). Each with validity_days, priority, source_acceptance, on_site_required, timing_note, condition_note.

Dependencies: Requires app.seed_graph.


Default Tenant

All seeded data belongs to the tenant tenant-apollo-001. This is the default development and demo tenant used across all environments.

Property Value
Tenant ID tenant-apollo-001
Tenant Name Apollo Medical Tourism
Plan Pro
Created By Base seed

Resetting All Data

To wipe and re-seed everything from scratch:

# 1. Reset PostgreSQL
alembic downgrade base
alembic upgrade head

# 2. Clear Neo4j
python -c "
from neo4j import GraphDatabase
import os
driver = GraphDatabase.driver(os.environ['NEO4J_URI'],
    auth=(os.environ['NEO4J_USERNAME'], os.environ['NEO4J_PASSWORD']))
with driver.session() as s:
    s.run('MATCH (n) DETACH DELETE n')
driver.close()
"

# 3. Clear Qdrant
python -c "
from qdrant_client import QdrantClient
import os
client = QdrantClient(url=os.environ['QDRANT_URL'], api_key=os.environ['QDRANT_API_KEY'])
client.delete_collection('providers')
"

# 4. Run full seed sequence
python -m app.seed && \
python -m app.seed_providers && \
python -m app.seed_demo && \
python -m app.seed_graph && \
python -m app.seed_embeddings && \
python -m app.seed_doctors && \
python -m app.seed_storefront

Verifying Seeds

After running the full seed sequence, verify each database:

# PostgreSQL: check record counts
python -c "
from app.db import get_session
# Expected: 1 tenant, 42 providers, 1 patient, 3 cases, 8 doctors
"

# Neo4j: check node counts (see database-ops.md for Cypher queries)

# Qdrant: check collection (see database-ops.md for Python snippet)

See the database operations runbook for detailed verification queries.


Recovery Provider Seed (#958)

Seeds 6 recovery facilities (3 India + 3 Turkey) into Postgres for the patient_transport_offer_enabled / enable_recovery_offer flag flips.

DATABASE_URL_SERVICE=... python3 scripts/seed_recovery_providers.py

# On Railway (recommended for prod):
railway run --service curaway -- python3 scripts/seed_recovery_providers.py

Idempotency: keyed by providers.slug. Re-runs are safe — existing facility rows are skipped, BUT partnership rows are still healed (a facility seeded before its referenced surgical hospital exists will get its partnership row added on the next run).

Rollback: delete the seeded rows by slug:

-- Delete partnerships first (FK constraint)
DELETE FROM hospital_recovery_partnerships
WHERE recovery_provider_id IN (
    SELECT id FROM providers
    WHERE slug IN (
        'asha-recovery-jubilee-hills',
        'medanta-step-down-gurgaon',
        'banjara-wellness-hyderabad',
        'bosphorus-recovery-istanbul',
        'acibadem-recovery-istanbul',
        'kadikoy-step-down-istanbul'
    )
);

-- Then profiles
DELETE FROM recovery_provider_profiles
WHERE provider_id IN (
    SELECT id FROM providers WHERE slug IN (...)
);

-- Then provider catalog rows
DELETE FROM providers
WHERE slug IN (
    'asha-recovery-jubilee-hills',
    'medanta-step-down-gurgaon',
    'banjara-wellness-hyderabad',
    'bosphorus-recovery-istanbul',
    'acibadem-recovery-istanbul',
    'kadikoy-step-down-istanbul'
);

Verify after seed:

SELECT facility_type, accommodation_tier, COUNT(*)
FROM recovery_provider_profiles
WHERE tenant_id = 'tenant-curaway-default'
GROUP BY facility_type, accommodation_tier
ORDER BY facility_type, accommodation_tier;

Transport Vendor Seed (#959 flag prerequisite)

Required before flipping patient_transport_offer_enabled ON for a tenant — without seeded vendors the matcher returns empty and patients see the text fallback.

DATABASE_URL_SERVICE=... python3 scripts/seed_transport_vendors.py

8 vendors across India (3), Turkey (2), Thailand (2), Spain (1). Same idempotency + rollback pattern as the recovery seed above (keyed on providers.slug).