Database Operations¶

This runbook covers day-to-day database operations for Curaway's three data stores: PostgreSQL (primary), Neo4j (knowledge graph), and Qdrant (vector search).

PostgreSQL¶

PostgreSQL is the primary data store for all relational data: patients, cases, providers, doctors, consent records, and FHIR clinical data.

Alembic Migrations¶

Curaway uses Alembic for schema migrations, integrated with SQLAlchemy async models.

Apply All Pending Migrations¶

alembic upgrade head

This is safe to run repeatedly -- it only applies migrations that have not yet been applied.

Create a New Migration¶

alembic revision --autogenerate -m "Add description of the schema change"

This compares the current SQLAlchemy models against the database schema and generates a migration script in alembic/versions/. Always review the generated file before applying it:

# Review the generated migration
cat alembic/versions/<timestamp>_add_description_of_the_schema_change.py

# Apply it
alembic upgrade head

Check Current Migration State¶

# Show the current revision applied to the database
alembic current

# Show migration history
alembic history --verbose

Roll Back the Last Migration¶

alembic downgrade -1

Downgrade Caution

Downgrade scripts are auto-generated and may not perfectly reverse all changes (especially data migrations). Always test downgrades in staging before running in production.

Full Reset (Development Only)¶

alembic downgrade base
alembic upgrade head
python -m app.seed

Backup and Restore¶

Backup via pg_dump¶

# Full backup
pg_dump "$DATABASE_URL" > backup_$(date +%Y%m%d_%H%M%S).sql

# Schema only
pg_dump "$DATABASE_URL" --schema-only > schema_backup.sql

# Specific table
pg_dump "$DATABASE_URL" -t cases > cases_backup.sql

Restore from Backup¶

psql "$DATABASE_URL" < backup_20260331_120000.sql

Useful Queries¶

Check Table Sizes¶

SELECT
    schemaname,
    relname AS table_name,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    n_live_tup AS row_count
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

Check Active Connections¶

SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query_start,
    query
FROM pg_stat_activity
WHERE datname = 'curaway'
ORDER BY query_start DESC;

Check Pending Migrations (Application-Level)¶

SELECT * FROM alembic_version;

Find Cases by Status¶

SELECT status, COUNT(*) as count
FROM cases
WHERE tenant_id = 'tenant-apollo-001'
GROUP BY status
ORDER BY count DESC;

Neo4j (Knowledge Graph)¶

Neo4j stores the provider-procedure-country knowledge graph used for relationship-based matching.

Seeding the Graph¶

python -m app.seed_graph

This populates Neo4j with:

Provider nodes -- Each provider with their specialties and accreditations
Procedure nodes -- Medical procedures with category hierarchy
Country nodes -- Countries where providers operate
Relationships -- PERFORMS (provider-procedure), LOCATED_IN (provider-country), AVAILABLE_IN (procedure-country)

Verifying the Graph¶

Check Node Counts¶

// Total node count by label
MATCH (n)
RETURN labels(n) AS label, COUNT(n) AS count
ORDER BY count DESC;

Expected output (after full seed):

Label	Count
Provider	42
Procedure	~30
Country	8
Doctor	8

Check Relationship Counts¶

// Total relationship count by type
MATCH ()-[r]->()
RETURN type(r) AS relationship, COUNT(r) AS count
ORDER BY count DESC;

Verify a Specific Provider's Graph¶

// Show all relationships for a provider
MATCH (p:Provider {id: 'prov-berlin-ortho-01'})-[r]-(connected)
RETURN p, type(r), connected;

Find Providers for a Procedure in a Country¶

MATCH (p:Provider)-[:PERFORMS]->(proc:Procedure {name: 'Hip Replacement'}),
      (p)-[:LOCATED_IN]->(c:Country {code: 'DE'})
RETURN p.name, p.id, c.code;

Clearing and Re-seeding¶

# Clear all nodes and relationships (DESTRUCTIVE)
# Only do this if you intend to re-seed immediately
python -c "
from neo4j import GraphDatabase
driver = GraphDatabase.driver('$NEO4J_URI', auth=('$NEO4J_USERNAME', '$NEO4J_PASSWORD'))
with driver.session() as session:
    session.run('MATCH (n) DETACH DELETE n')
driver.close()
"

# Re-seed
python -m app.seed_graph

Qdrant (Vector Search)¶

Qdrant stores embeddings for semantic provider search. Each provider has an embedding generated from their profile description, specialties, and procedures.

Seeding Embeddings¶

python -m app.seed_embeddings

This:

Reads all provider records from PostgreSQL.
Generates text embeddings using the configured embedding model (default: text-embedding-3-small via OpenAI).
Upserts the embeddings into the Qdrant providers collection.

Verifying the Collection¶

Check Collection Info¶

from qdrant_client import QdrantClient

client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")

# Collection info
info = client.get_collection("providers")
print(f"Points: {info.points_count}")
print(f"Vectors: {info.vectors_count}")
print(f"Dimension: {info.config.params.vectors.size}")

Expected:

Property	Value
Points	42 (one per provider)
Dimension	1536 (text-embedding-3-small)
Distance	Cosine

Test a Semantic Search¶

from qdrant_client import QdrantClient, models

client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")

# Search for orthopedic providers in Germany
results = client.search(
    collection_name="providers",
    query_vector=embedding_vector,  # Generate from query text
    limit=5,
    query_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="country",
                match=models.MatchValue(value="DE"),
            )
        ]
    ),
)

Re-seeding Embeddings¶

If provider data changes or the embedding model is updated, re-seed the entire collection:

python -m app.seed_embeddings

This performs an upsert, so existing embeddings for unchanged providers are overwritten (not duplicated).

Deleting the Collection (Emergency)¶

from qdrant_client import QdrantClient

client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")
client.delete_collection("providers")

Then re-seed:

python -m app.seed_embeddings

Cross-Database Consistency¶

Since data spans three databases, keep these consistency rules in mind:

Scenario	Action
New provider added to PostgreSQL	Run `python -m app.seed_graph` and `python -m app.seed_embeddings` to update Neo4j and Qdrant
Provider deleted from PostgreSQL	Manually remove from Neo4j and Qdrant, or re-seed both
Schema migration changes provider model	Re-seed Neo4j and Qdrant after migration
Embedding model changed	Re-seed Qdrant (`python -m app.seed_embeddings`)

Seed Order Matters

Always seed in order: PostgreSQL first (via Alembic + app.seed), then Neo4j (app.seed_graph), then Qdrant (app.seed_embeddings). Downstream seeds read from PostgreSQL.