Seed Data¶
This runbook documents the full seed pipeline for populating Curaway's databases with initial data. Seeds must be run in a specific order because later seeds depend on records created by earlier ones.
Seed Order¶
Run the seeds in this exact sequence:
# 1. Base seed: tenant + consent records
python -m app.seed
# 2. Providers: 42 providers across 8 countries
python -m app.seed_providers
# 3. Demo data: demo patient Aisha + sample cases
python -m app.seed_demo
# 4. Knowledge graph: Neo4j provider-procedure-country graph
python -m app.seed_graph
# 5. Embeddings: Qdrant vector embeddings for semantic search
python -m app.seed_embeddings
# 6. Doctors: 8 doctor profiles + language services
python -m app.seed_doctors
# 7. Storefront: treatment categories, provider content, completeness scores
python -m app.seed_storefront
Or as a single command:
python -m app.seed && \
python -m app.seed_providers && \
python -m app.seed_demo && \
python -m app.seed_graph && \
python -m app.seed_embeddings && \
python -m app.seed_doctors && \
python -m app.seed_storefront
Seed Details¶
1. python -m app.seed -- Base Seed¶
Creates the foundational records that all other seeds depend on.
Creates:
| Record | Details |
|---|---|
| Tenant | tenant-apollo-001 -- the default development/demo tenant |
| Consent types | GDPR consent categories: treatment, data_sharing, marketing, research |
| Consent templates | Default consent text templates in English, Arabic, German, and Turkish |
Idempotent: Yes -- skips records that already exist.
Dependencies: None. This must run first.
2. python -m app.seed_providers -- Provider Seed¶
Populates the providers table with 42 medical providers across 8 countries.
Creates:
| Country | Code | Provider Count | Specialties |
|---|---|---|---|
| Germany | DE | 8 | Orthopedics, cardiology, oncology, neurosurgery |
| Turkey | TR | 7 | Dental, ophthalmology, cosmetic surgery, cardiac |
| India | IN | 6 | Cardiac surgery, orthopedics, oncology, fertility |
| Thailand | TH | 5 | Cosmetic surgery, dental, orthopedics |
| South Korea | KR | 4 | Cosmetic surgery, ophthalmology, dermatology |
| Spain | ES | 4 | Oncology, orthopedics, fertility |
| Mexico | MX | 4 | Dental, bariatric surgery, cosmetic surgery |
| UAE | AE | 4 | Cardiac, orthopedics, fertility, oncology |
Each provider record includes:
- Name, country, city
- Specialties (array)
- Accreditations (JCI, ISO 9001, etc.)
- Description (used for embedding generation)
- Accepted insurance networks
- Languages spoken at the facility
- Average procedure costs (range)
Idempotent: Yes -- uses upsert by provider ID.
Dependencies: Requires app.seed (needs tenant record).
3. python -m app.seed_demo -- Demo Data Seed (legacy — prefer scripts/seed_e2e.py)¶
Deprecated for test runs (#1194 E2): the patient this script creates (
demo-patient-aisha-001) is quarantined. Five unrelated personae and 64 cases were stacked on the single demographic record, which poisons the matcher (cross-case FHIR contamination). For test journeys, use the clean per-persona accounts seeded byscripts/seed_e2e.py [--persona maria|abdul|meskerem]instead. Seedocs/runbook/test-data-hygiene.md.
Creates a demo patient and sample cases for testing and demonstrations.
Creates:
| Record | Details |
|---|---|
| Patient | Aisha Patel -- female, DOB 1985-06-15, UAE resident, email aisha.patel@example.com (quarantined — see note above) |
| Case 1 | Hip replacement, status in_review, assigned to Berlin Orthopedic Center |
| Case 2 | Dental implants, status draft, no provider assigned yet |
| Case 3 | Cardiac consultation, status completed, assigned to Munich Heart Clinic |
| Documents | Sample medical report and MRI document references for Case 1 |
| FHIR resources | Patient, Condition (osteoarthritis M16.1), Procedure (hip replacement), Observation (hemoglobin) |
| Chat history | 5 sample chat messages for Case 1 showing intake conversation flow |
Idempotent: Yes -- checks for existing demo patient by email before creating.
Dependencies: Requires app.seed and app.seed_providers.
3a. python scripts/seed_e2e.py -- Clean Per-Persona E2E Seed (preferred)¶
Replaces the polluted app.seed_demo for journey testing. Each persona
gets its own demographic record and its own Case, so the matcher
(when passed case_id) sees a single-case FHIR view.
Personas:
| ID | Details |
|---|---|
demo-patient-maria-001 |
Maria, 45F, Dubai, stage 2 IDC left breast |
demo-patient-abdul-001 |
Abdul Moeed, male, Jeddah, leukaemia |
demo-patient-meskerem-001 |
Meskerem, 38F, Ethiopia, absent iliac vein |
Idempotent: Yes (dedup on external_auth_id).
See app/seeds/seed_persona_accounts.py for the seed functions and
tests/test_persona_accounts_seed.py for guarantees.
4. python -m app.seed_graph -- Knowledge Graph Seed¶
Populates the Neo4j knowledge graph with provider-procedure-country relationships.
Creates:
| Node Type | Count | Properties |
|---|---|---|
| Provider | 42 | id, name, country, specialties, accreditations |
| Procedure | ~30 | name, category, typical_duration, complexity |
| Country | 8 | code, name, timezone, currency |
| Relationship | Description | Count |
|---|---|---|
PERFORMS |
Provider performs a procedure | ~120 |
LOCATED_IN |
Provider is located in a country | 42 |
AVAILABLE_IN |
Procedure is available in a country | ~60 |
SPECIALIZES_IN |
Provider specializes in a medical field | ~80 |
Idempotent: Yes -- uses MERGE in Cypher to avoid duplicates.
Dependencies: Requires app.seed_providers (reads provider data from PostgreSQL).
5. python -m app.seed_embeddings -- Vector Embedding Seed¶
Generates text embeddings for all providers and upserts them into Qdrant.
Creates:
| Collection | Points | Dimension | Distance |
|---|---|---|---|
providers |
42 | 1536 | Cosine |
Each point includes:
- Vector: Embedding of the provider's description + specialties + procedures
- Payload: provider_id, name, country, specialties, accreditations
Cost: ~42 API calls to OpenAI embedding endpoint (~$0.01 total).
Idempotent: Yes -- upserts by provider ID.
Dependencies: Requires app.seed_providers and a valid OPENAI_API_KEY.
6. python -m app.seed_doctors -- Doctor Seed¶
Creates doctor profiles and their procedure associations.
Creates:
| Doctor | Specialty | Country | Languages | Provider |
|---|---|---|---|---|
| Dr. Hans Mueller | Orthopedic Surgery | DE | German, English | Berlin Orthopedic Center |
| Dr. Fatma Yilmaz | Cardiac Surgery | TR | Turkish, English, Arabic | Istanbul Heart Center |
| Dr. Raj Sharma | Oncology | IN | Hindi, English | Mumbai Cancer Institute |
| Dr. Suki Tanaka | Cosmetic Surgery | TH | Thai, English, Japanese | Bangkok Aesthetic Clinic |
| Dr. Min-Jun Park | Ophthalmology | KR | Korean, English | Seoul Eye Center |
| Dr. Elena Rodriguez | Fertility | ES | Spanish, English, French | Barcelona Fertility Clinic |
| Dr. Carlos Mendez | Bariatric Surgery | MX | Spanish, English | Monterrey Weight Loss Center |
| Dr. Amira Hassan | Cardiology | AE | Arabic, English, French | Dubai Heart Hospital |
Each doctor record includes:
- Full name, specialty, license number
- Languages spoken
- Years of experience
- Associated provider (foreign key)
- Procedures performed (many-to-many)
- Education and certifications
Additionally seeds language services for each provider based on the languages spoken by their doctors.
Idempotent: Yes -- checks for existing doctors by license number.
Dependencies: Requires app.seed_providers.
7. python -m app.seed_storefront -- Storefront Seed¶
Seeds 12 treatment categories, updates 6 demo providers with storefront content (taglines, facilities, cultural support, travel info), and computes completeness scores for all 42 providers.
Creates:
| Record | Details |
|---|---|
| Treatment categories | 12 categories (knee replacement, CABG, IVF, dental implants, etc.) |
| Provider storefront content | Taglines, facility lists, cultural support, travel info for 6 demo providers |
| Completeness scores | Computed for all 42 providers |
Idempotent: Yes -- upserts by slug/ID.
Dependencies: Requires app.seed_providers and app.seed_doctors.
8. python -m app.seed_full_enrichment -- Full Provider Enrichment¶
Enriches ALL 42 providers with storefront data: taglines, hero images, technology lists, cultural support, quality enrichment. Updates all 12 treatments with descriptions, what_to_expect, recovery_info. Adds bios to all 8 original doctors. Recomputes completeness scores (all 42 → premium tier).
Dependencies: Requires app.seed_storefront.
9. python -m app.seed_doctors_full -- Full Doctor Seeding¶
Seeds 83 additional doctors across all 42 providers (2-3 per provider). Country-appropriate names, qualifications, specialties, photos, bios. Creates DoctorProcedure entries and syncs to Neo4j.
Dependencies: Requires app.seed_providers.
10. python -m app.seed_doctor_photos -- Unique Doctor Photos¶
Assigns unique Unsplash portrait URLs to all 91 doctors (47 male, 44 female). Uses sig parameter for URL uniqueness.
Dependencies: Requires app.seed_doctors_full.
11. python -m app.seed_procedure_tests -- Procedure Test Requirements¶
Seeds REQUIRES_TEST relationships in Neo4j for 15 additional procedures (137 test mappings). Each with validity_days, priority, source_acceptance, on_site_required, timing_note, condition_note.
Dependencies: Requires app.seed_graph.
Default Tenant¶
All seeded data belongs to the tenant tenant-apollo-001. This is the default development and demo tenant used across all environments.
| Property | Value |
|---|---|
| Tenant ID | tenant-apollo-001 |
| Tenant Name | Apollo Medical Tourism |
| Plan | Pro |
| Created By | Base seed |
Resetting All Data¶
To wipe and re-seed everything from scratch:
# 1. Reset PostgreSQL
alembic downgrade base
alembic upgrade head
# 2. Clear Neo4j
python -c "
from neo4j import GraphDatabase
import os
driver = GraphDatabase.driver(os.environ['NEO4J_URI'],
auth=(os.environ['NEO4J_USERNAME'], os.environ['NEO4J_PASSWORD']))
with driver.session() as s:
s.run('MATCH (n) DETACH DELETE n')
driver.close()
"
# 3. Clear Qdrant
python -c "
from qdrant_client import QdrantClient
import os
client = QdrantClient(url=os.environ['QDRANT_URL'], api_key=os.environ['QDRANT_API_KEY'])
client.delete_collection('providers')
"
# 4. Run full seed sequence
python -m app.seed && \
python -m app.seed_providers && \
python -m app.seed_demo && \
python -m app.seed_graph && \
python -m app.seed_embeddings && \
python -m app.seed_doctors && \
python -m app.seed_storefront
Verifying Seeds¶
After running the full seed sequence, verify each database:
# PostgreSQL: check record counts
python -c "
from app.db import get_session
# Expected: 1 tenant, 42 providers, 1 patient, 3 cases, 8 doctors
"
# Neo4j: check node counts (see database-ops.md for Cypher queries)
# Qdrant: check collection (see database-ops.md for Python snippet)
See the database operations runbook for detailed verification queries.
Recovery Provider Seed (#958)¶
Seeds 6 recovery facilities (3 India + 3 Turkey) into Postgres for the
patient_transport_offer_enabled / enable_recovery_offer flag flips.
DATABASE_URL_SERVICE=... python3 scripts/seed_recovery_providers.py
# On Railway (recommended for prod):
railway run --service curaway -- python3 scripts/seed_recovery_providers.py
Idempotency: keyed by providers.slug. Re-runs are safe — existing
facility rows are skipped, BUT partnership rows are still healed (a
facility seeded before its referenced surgical hospital exists will get
its partnership row added on the next run).
Rollback: delete the seeded rows by slug:
-- Delete partnerships first (FK constraint)
DELETE FROM hospital_recovery_partnerships
WHERE recovery_provider_id IN (
SELECT id FROM providers
WHERE slug IN (
'asha-recovery-jubilee-hills',
'medanta-step-down-gurgaon',
'banjara-wellness-hyderabad',
'bosphorus-recovery-istanbul',
'acibadem-recovery-istanbul',
'kadikoy-step-down-istanbul'
)
);
-- Then profiles
DELETE FROM recovery_provider_profiles
WHERE provider_id IN (
SELECT id FROM providers WHERE slug IN (...)
);
-- Then provider catalog rows
DELETE FROM providers
WHERE slug IN (
'asha-recovery-jubilee-hills',
'medanta-step-down-gurgaon',
'banjara-wellness-hyderabad',
'bosphorus-recovery-istanbul',
'acibadem-recovery-istanbul',
'kadikoy-step-down-istanbul'
);
Verify after seed:
SELECT facility_type, accommodation_tier, COUNT(*)
FROM recovery_provider_profiles
WHERE tenant_id = 'tenant-curaway-default'
GROUP BY facility_type, accommodation_tier
ORDER BY facility_type, accommodation_tier;
Transport Vendor Seed (#959 flag prerequisite)¶
Required before flipping patient_transport_offer_enabled ON for a
tenant — without seeded vendors the matcher returns empty and patients
see the text fallback.
8 vendors across India (3), Turkey (2), Thailand (2), Spain (1).
Same idempotency + rollback pattern as the recovery seed above
(keyed on providers.slug).