Skip to content

Seed Data

This runbook documents the full seed pipeline for populating Curaway's databases with initial data. Seeds must be run in a specific order because later seeds depend on records created by earlier ones.


Seed Order

Run the seeds in this exact sequence:

# 1. Base seed: tenant + consent records
python -m app.seed

# 2. Providers: 42 providers across 8 countries
python -m app.seed_providers

# 3. Demo data: demo patient Aisha + sample cases
python -m app.seed_demo

# 4. Knowledge graph: Neo4j provider-procedure-country graph
python -m app.seed_graph

# 5. Embeddings: Qdrant vector embeddings for semantic search
python -m app.seed_embeddings

# 6. Doctors: 8 doctor profiles + language services
python -m app.seed_doctors

# 7. Storefront: treatment categories, provider content, completeness scores
python -m app.seed_storefront

Or as a single command:

python -m app.seed && \
python -m app.seed_providers && \
python -m app.seed_demo && \
python -m app.seed_graph && \
python -m app.seed_embeddings && \
python -m app.seed_doctors && \
python -m app.seed_storefront

Seed Details

1. python -m app.seed -- Base Seed

Creates the foundational records that all other seeds depend on.

Creates:

Record Details
Tenant tenant-apollo-001 -- the default development/demo tenant
Consent types GDPR consent categories: treatment, data_sharing, marketing, research
Consent templates Default consent text templates in English, Arabic, German, and Turkish

Idempotent: Yes -- skips records that already exist.

Dependencies: None. This must run first.

2. python -m app.seed_providers -- Provider Seed

Populates the providers table with 42 medical providers across 8 countries.

Creates:

Country Code Provider Count Specialties
Germany DE 8 Orthopedics, cardiology, oncology, neurosurgery
Turkey TR 7 Dental, ophthalmology, cosmetic surgery, cardiac
India IN 6 Cardiac surgery, orthopedics, oncology, fertility
Thailand TH 5 Cosmetic surgery, dental, orthopedics
South Korea KR 4 Cosmetic surgery, ophthalmology, dermatology
Spain ES 4 Oncology, orthopedics, fertility
Mexico MX 4 Dental, bariatric surgery, cosmetic surgery
UAE AE 4 Cardiac, orthopedics, fertility, oncology

Each provider record includes:

  • Name, country, city
  • Specialties (array)
  • Accreditations (JCI, ISO 9001, etc.)
  • Description (used for embedding generation)
  • Accepted insurance networks
  • Languages spoken at the facility
  • Average procedure costs (range)

Idempotent: Yes -- uses upsert by provider ID.

Dependencies: Requires app.seed (needs tenant record).

3. python -m app.seed_demo -- Demo Data Seed

Creates a demo patient and sample cases for testing and demonstrations.

Creates:

Record Details
Patient Aisha Patel -- female, DOB 1985-06-15, UAE resident, email aisha.patel@example.com
Case 1 Hip replacement, status in_review, assigned to Berlin Orthopedic Center
Case 2 Dental implants, status draft, no provider assigned yet
Case 3 Cardiac consultation, status completed, assigned to Munich Heart Clinic
Documents Sample medical report and MRI document references for Case 1
FHIR resources Patient, Condition (osteoarthritis M16.1), Procedure (hip replacement), Observation (hemoglobin)
Chat history 5 sample chat messages for Case 1 showing intake conversation flow

Idempotent: Yes -- checks for existing demo patient by email before creating.

Dependencies: Requires app.seed and app.seed_providers.

4. python -m app.seed_graph -- Knowledge Graph Seed

Populates the Neo4j knowledge graph with provider-procedure-country relationships.

Creates:

Node Type Count Properties
Provider 42 id, name, country, specialties, accreditations
Procedure ~30 name, category, typical_duration, complexity
Country 8 code, name, timezone, currency
Relationship Description Count
PERFORMS Provider performs a procedure ~120
LOCATED_IN Provider is located in a country 42
AVAILABLE_IN Procedure is available in a country ~60
SPECIALIZES_IN Provider specializes in a medical field ~80

Idempotent: Yes -- uses MERGE in Cypher to avoid duplicates.

Dependencies: Requires app.seed_providers (reads provider data from PostgreSQL).

5. python -m app.seed_embeddings -- Vector Embedding Seed

Generates text embeddings for all providers and upserts them into Qdrant.

Creates:

Collection Points Dimension Distance
providers 42 1536 Cosine

Each point includes:

  • Vector: Embedding of the provider's description + specialties + procedures
  • Payload: provider_id, name, country, specialties, accreditations

Cost: ~42 API calls to OpenAI embedding endpoint (~$0.01 total).

Idempotent: Yes -- upserts by provider ID.

Dependencies: Requires app.seed_providers and a valid OPENAI_API_KEY.

6. python -m app.seed_doctors -- Doctor Seed

Creates doctor profiles and their procedure associations.

Creates:

Doctor Specialty Country Languages Provider
Dr. Hans Mueller Orthopedic Surgery DE German, English Berlin Orthopedic Center
Dr. Fatma Yilmaz Cardiac Surgery TR Turkish, English, Arabic Istanbul Heart Center
Dr. Raj Sharma Oncology IN Hindi, English Mumbai Cancer Institute
Dr. Suki Tanaka Cosmetic Surgery TH Thai, English, Japanese Bangkok Aesthetic Clinic
Dr. Min-Jun Park Ophthalmology KR Korean, English Seoul Eye Center
Dr. Elena Rodriguez Fertility ES Spanish, English, French Barcelona Fertility Clinic
Dr. Carlos Mendez Bariatric Surgery MX Spanish, English Monterrey Weight Loss Center
Dr. Amira Hassan Cardiology AE Arabic, English, French Dubai Heart Hospital

Each doctor record includes:

  • Full name, specialty, license number
  • Languages spoken
  • Years of experience
  • Associated provider (foreign key)
  • Procedures performed (many-to-many)
  • Education and certifications

Additionally seeds language services for each provider based on the languages spoken by their doctors.

Idempotent: Yes -- checks for existing doctors by license number.

Dependencies: Requires app.seed_providers.

7. python -m app.seed_storefront -- Storefront Seed

Seeds 12 treatment categories, updates 6 demo providers with storefront content (taglines, facilities, cultural support, travel info), and computes completeness scores for all 42 providers.

Creates:

Record Details
Treatment categories 12 categories (knee replacement, CABG, IVF, dental implants, etc.)
Provider storefront content Taglines, facility lists, cultural support, travel info for 6 demo providers
Completeness scores Computed for all 42 providers

Idempotent: Yes -- upserts by slug/ID.

Dependencies: Requires app.seed_providers and app.seed_doctors.


Default Tenant

All seeded data belongs to the tenant tenant-apollo-001. This is the default development and demo tenant used across all environments.

Property Value
Tenant ID tenant-apollo-001
Tenant Name Apollo Medical Tourism
Plan Pro
Created By Base seed

Resetting All Data

To wipe and re-seed everything from scratch:

# 1. Reset PostgreSQL
alembic downgrade base
alembic upgrade head

# 2. Clear Neo4j
python -c "
from neo4j import GraphDatabase
import os
driver = GraphDatabase.driver(os.environ['NEO4J_URI'],
    auth=(os.environ['NEO4J_USERNAME'], os.environ['NEO4J_PASSWORD']))
with driver.session() as s:
    s.run('MATCH (n) DETACH DELETE n')
driver.close()
"

# 3. Clear Qdrant
python -c "
from qdrant_client import QdrantClient
import os
client = QdrantClient(url=os.environ['QDRANT_URL'], api_key=os.environ['QDRANT_API_KEY'])
client.delete_collection('providers')
"

# 4. Run full seed sequence
python -m app.seed && \
python -m app.seed_providers && \
python -m app.seed_demo && \
python -m app.seed_graph && \
python -m app.seed_embeddings && \
python -m app.seed_doctors && \
python -m app.seed_storefront

Verifying Seeds

After running the full seed sequence, verify each database:

# PostgreSQL: check record counts
python -c "
from app.db import get_session
# Expected: 1 tenant, 42 providers, 1 patient, 3 cases, 8 doctors
"

# Neo4j: check node counts (see database-ops.md for Cypher queries)

# Qdrant: check collection (see database-ops.md for Python snippet)

See the database operations runbook for detailed verification queries.