Seed Data¶
This runbook documents the full seed pipeline for populating Curaway's databases with initial data. Seeds must be run in a specific order because later seeds depend on records created by earlier ones.
Seed Order¶
Run the seeds in this exact sequence:
# 1. Base seed: tenant + consent records
python -m app.seed
# 2. Providers: 42 providers across 8 countries
python -m app.seed_providers
# 3. Demo data: demo patient Aisha + sample cases
python -m app.seed_demo
# 4. Knowledge graph: Neo4j provider-procedure-country graph
python -m app.seed_graph
# 5. Embeddings: Qdrant vector embeddings for semantic search
python -m app.seed_embeddings
# 6. Doctors: 8 doctor profiles + language services
python -m app.seed_doctors
# 7. Storefront: treatment categories, provider content, completeness scores
python -m app.seed_storefront
Or as a single command:
python -m app.seed && \
python -m app.seed_providers && \
python -m app.seed_demo && \
python -m app.seed_graph && \
python -m app.seed_embeddings && \
python -m app.seed_doctors && \
python -m app.seed_storefront
Seed Details¶
1. python -m app.seed -- Base Seed¶
Creates the foundational records that all other seeds depend on.
Creates:
| Record | Details |
|---|---|
| Tenant | tenant-apollo-001 -- the default development/demo tenant |
| Consent types | GDPR consent categories: treatment, data_sharing, marketing, research |
| Consent templates | Default consent text templates in English, Arabic, German, and Turkish |
Idempotent: Yes -- skips records that already exist.
Dependencies: None. This must run first.
2. python -m app.seed_providers -- Provider Seed¶
Populates the providers table with 42 medical providers across 8 countries.
Creates:
| Country | Code | Provider Count | Specialties |
|---|---|---|---|
| Germany | DE | 8 | Orthopedics, cardiology, oncology, neurosurgery |
| Turkey | TR | 7 | Dental, ophthalmology, cosmetic surgery, cardiac |
| India | IN | 6 | Cardiac surgery, orthopedics, oncology, fertility |
| Thailand | TH | 5 | Cosmetic surgery, dental, orthopedics |
| South Korea | KR | 4 | Cosmetic surgery, ophthalmology, dermatology |
| Spain | ES | 4 | Oncology, orthopedics, fertility |
| Mexico | MX | 4 | Dental, bariatric surgery, cosmetic surgery |
| UAE | AE | 4 | Cardiac, orthopedics, fertility, oncology |
Each provider record includes:
- Name, country, city
- Specialties (array)
- Accreditations (JCI, ISO 9001, etc.)
- Description (used for embedding generation)
- Accepted insurance networks
- Languages spoken at the facility
- Average procedure costs (range)
Idempotent: Yes -- uses upsert by provider ID.
Dependencies: Requires app.seed (needs tenant record).
3. python -m app.seed_demo -- Demo Data Seed¶
Creates a demo patient and sample cases for testing and demonstrations.
Creates:
| Record | Details |
|---|---|
| Patient | Aisha Patel -- female, DOB 1985-06-15, UAE resident, email aisha.patel@example.com |
| Case 1 | Hip replacement, status in_review, assigned to Berlin Orthopedic Center |
| Case 2 | Dental implants, status draft, no provider assigned yet |
| Case 3 | Cardiac consultation, status completed, assigned to Munich Heart Clinic |
| Documents | Sample medical report and MRI document references for Case 1 |
| FHIR resources | Patient, Condition (osteoarthritis M16.1), Procedure (hip replacement), Observation (hemoglobin) |
| Chat history | 5 sample chat messages for Case 1 showing intake conversation flow |
Idempotent: Yes -- checks for existing demo patient by email before creating.
Dependencies: Requires app.seed and app.seed_providers.
4. python -m app.seed_graph -- Knowledge Graph Seed¶
Populates the Neo4j knowledge graph with provider-procedure-country relationships.
Creates:
| Node Type | Count | Properties |
|---|---|---|
| Provider | 42 | id, name, country, specialties, accreditations |
| Procedure | ~30 | name, category, typical_duration, complexity |
| Country | 8 | code, name, timezone, currency |
| Relationship | Description | Count |
|---|---|---|
PERFORMS |
Provider performs a procedure | ~120 |
LOCATED_IN |
Provider is located in a country | 42 |
AVAILABLE_IN |
Procedure is available in a country | ~60 |
SPECIALIZES_IN |
Provider specializes in a medical field | ~80 |
Idempotent: Yes -- uses MERGE in Cypher to avoid duplicates.
Dependencies: Requires app.seed_providers (reads provider data from PostgreSQL).
5. python -m app.seed_embeddings -- Vector Embedding Seed¶
Generates text embeddings for all providers and upserts them into Qdrant.
Creates:
| Collection | Points | Dimension | Distance |
|---|---|---|---|
providers |
42 | 1536 | Cosine |
Each point includes:
- Vector: Embedding of the provider's description + specialties + procedures
- Payload: provider_id, name, country, specialties, accreditations
Cost: ~42 API calls to OpenAI embedding endpoint (~$0.01 total).
Idempotent: Yes -- upserts by provider ID.
Dependencies: Requires app.seed_providers and a valid OPENAI_API_KEY.
6. python -m app.seed_doctors -- Doctor Seed¶
Creates doctor profiles and their procedure associations.
Creates:
| Doctor | Specialty | Country | Languages | Provider |
|---|---|---|---|---|
| Dr. Hans Mueller | Orthopedic Surgery | DE | German, English | Berlin Orthopedic Center |
| Dr. Fatma Yilmaz | Cardiac Surgery | TR | Turkish, English, Arabic | Istanbul Heart Center |
| Dr. Raj Sharma | Oncology | IN | Hindi, English | Mumbai Cancer Institute |
| Dr. Suki Tanaka | Cosmetic Surgery | TH | Thai, English, Japanese | Bangkok Aesthetic Clinic |
| Dr. Min-Jun Park | Ophthalmology | KR | Korean, English | Seoul Eye Center |
| Dr. Elena Rodriguez | Fertility | ES | Spanish, English, French | Barcelona Fertility Clinic |
| Dr. Carlos Mendez | Bariatric Surgery | MX | Spanish, English | Monterrey Weight Loss Center |
| Dr. Amira Hassan | Cardiology | AE | Arabic, English, French | Dubai Heart Hospital |
Each doctor record includes:
- Full name, specialty, license number
- Languages spoken
- Years of experience
- Associated provider (foreign key)
- Procedures performed (many-to-many)
- Education and certifications
Additionally seeds language services for each provider based on the languages spoken by their doctors.
Idempotent: Yes -- checks for existing doctors by license number.
Dependencies: Requires app.seed_providers.
7. python -m app.seed_storefront -- Storefront Seed¶
Seeds 12 treatment categories, updates 6 demo providers with storefront content (taglines, facilities, cultural support, travel info), and computes completeness scores for all 42 providers.
Creates:
| Record | Details |
|---|---|
| Treatment categories | 12 categories (knee replacement, CABG, IVF, dental implants, etc.) |
| Provider storefront content | Taglines, facility lists, cultural support, travel info for 6 demo providers |
| Completeness scores | Computed for all 42 providers |
Idempotent: Yes -- upserts by slug/ID.
Dependencies: Requires app.seed_providers and app.seed_doctors.
Default Tenant¶
All seeded data belongs to the tenant tenant-apollo-001. This is the default development and demo tenant used across all environments.
| Property | Value |
|---|---|
| Tenant ID | tenant-apollo-001 |
| Tenant Name | Apollo Medical Tourism |
| Plan | Pro |
| Created By | Base seed |
Resetting All Data¶
To wipe and re-seed everything from scratch:
# 1. Reset PostgreSQL
alembic downgrade base
alembic upgrade head
# 2. Clear Neo4j
python -c "
from neo4j import GraphDatabase
import os
driver = GraphDatabase.driver(os.environ['NEO4J_URI'],
auth=(os.environ['NEO4J_USERNAME'], os.environ['NEO4J_PASSWORD']))
with driver.session() as s:
s.run('MATCH (n) DETACH DELETE n')
driver.close()
"
# 3. Clear Qdrant
python -c "
from qdrant_client import QdrantClient
import os
client = QdrantClient(url=os.environ['QDRANT_URL'], api_key=os.environ['QDRANT_API_KEY'])
client.delete_collection('providers')
"
# 4. Run full seed sequence
python -m app.seed && \
python -m app.seed_providers && \
python -m app.seed_demo && \
python -m app.seed_graph && \
python -m app.seed_embeddings && \
python -m app.seed_doctors && \
python -m app.seed_storefront
Verifying Seeds¶
After running the full seed sequence, verify each database:
# PostgreSQL: check record counts
python -c "
from app.db import get_session
# Expected: 1 tenant, 42 providers, 1 patient, 3 cases, 8 doctors
"
# Neo4j: check node counts (see database-ops.md for Cypher queries)
# Qdrant: check collection (see database-ops.md for Python snippet)
See the database operations runbook for detailed verification queries.