Skip to content

Database Operations

This runbook covers day-to-day database operations for Curaway's three data stores: PostgreSQL (primary), Neo4j (knowledge graph), and Qdrant (vector search).


PostgreSQL

PostgreSQL is the primary data store for all relational data: patients, cases, providers, doctors, consent records, and FHIR clinical data.

Alembic Migrations

Curaway uses Alembic for schema migrations, integrated with SQLAlchemy async models.

Apply All Pending Migrations

alembic upgrade head

This is safe to run repeatedly -- it only applies migrations that have not yet been applied.

Create a New Migration

alembic revision --autogenerate -m "Add description of the schema change"

This compares the current SQLAlchemy models against the database schema and generates a migration script in alembic/versions/. Always review the generated file before applying it:

# Review the generated migration
cat alembic/versions/<timestamp>_add_description_of_the_schema_change.py

# Apply it
alembic upgrade head

Check Current Migration State

# Show the current revision applied to the database
alembic current

# Show migration history
alembic history --verbose

Roll Back the Last Migration

alembic downgrade -1

Downgrade Caution

Downgrade scripts are auto-generated and may not perfectly reverse all changes (especially data migrations). Always test downgrades in staging before running in production.

Full Reset (Development Only)

alembic downgrade base
alembic upgrade head
python -m app.seed

Backup and Restore

Backup via pg_dump

# Full backup
pg_dump "$DATABASE_URL" > backup_$(date +%Y%m%d_%H%M%S).sql

# Schema only
pg_dump "$DATABASE_URL" --schema-only > schema_backup.sql

# Specific table
pg_dump "$DATABASE_URL" -t cases > cases_backup.sql

Restore from Backup

psql "$DATABASE_URL" < backup_20260331_120000.sql

Useful Queries

Check Table Sizes

SELECT
    schemaname,
    relname AS table_name,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    n_live_tup AS row_count
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

Check Active Connections

SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query_start,
    query
FROM pg_stat_activity
WHERE datname = 'curaway'
ORDER BY query_start DESC;

Check Pending Migrations (Application-Level)

SELECT * FROM alembic_version;

Find Cases by Status

SELECT status, COUNT(*) as count
FROM cases
WHERE tenant_id = 'tenant-apollo-001'
GROUP BY status
ORDER BY count DESC;

Neo4j (Knowledge Graph)

Neo4j stores the provider-procedure-country knowledge graph used for relationship-based matching.

Seeding the Graph

python -m app.seed_graph

This populates Neo4j with:

  • Provider nodes -- Each provider with their specialties and accreditations
  • Procedure nodes -- Medical procedures with category hierarchy
  • Country nodes -- Countries where providers operate
  • Relationships -- PERFORMS (provider-procedure), LOCATED_IN (provider-country), AVAILABLE_IN (procedure-country)

Verifying the Graph

Check Node Counts

// Total node count by label
MATCH (n)
RETURN labels(n) AS label, COUNT(n) AS count
ORDER BY count DESC;

Expected output (after full seed):

Label Count
Provider 42
Procedure ~30
Country 8
Doctor 8

Check Relationship Counts

// Total relationship count by type
MATCH ()-[r]->()
RETURN type(r) AS relationship, COUNT(r) AS count
ORDER BY count DESC;

Verify a Specific Provider's Graph

// Show all relationships for a provider
MATCH (p:Provider {id: 'prov-berlin-ortho-01'})-[r]-(connected)
RETURN p, type(r), connected;

Find Providers for a Procedure in a Country

MATCH (p:Provider)-[:PERFORMS]->(proc:Procedure {name: 'Hip Replacement'}),
      (p)-[:LOCATED_IN]->(c:Country {code: 'DE'})
RETURN p.name, p.id, c.code;

Clearing and Re-seeding

# Clear all nodes and relationships (DESTRUCTIVE)
# Only do this if you intend to re-seed immediately
python -c "
from neo4j import GraphDatabase
driver = GraphDatabase.driver('$NEO4J_URI', auth=('$NEO4J_USERNAME', '$NEO4J_PASSWORD'))
with driver.session() as session:
    session.run('MATCH (n) DETACH DELETE n')
driver.close()
"

# Re-seed
python -m app.seed_graph

Qdrant stores embeddings for semantic provider search. Each provider has an embedding generated from their profile description, specialties, and procedures.

Seeding Embeddings

python -m app.seed_embeddings

This:

  1. Reads all provider records from PostgreSQL.
  2. Generates text embeddings using the configured embedding model (default: text-embedding-3-small via OpenAI).
  3. Upserts the embeddings into the Qdrant providers collection.

Verifying the Collection

Check Collection Info

from qdrant_client import QdrantClient

client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")

# Collection info
info = client.get_collection("providers")
print(f"Points: {info.points_count}")
print(f"Vectors: {info.vectors_count}")
print(f"Dimension: {info.config.params.vectors.size}")

Expected:

Property Value
Points 42 (one per provider)
Dimension 1536 (text-embedding-3-small)
Distance Cosine
from qdrant_client import QdrantClient, models

client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")

# Search for orthopedic providers in Germany
results = client.search(
    collection_name="providers",
    query_vector=embedding_vector,  # Generate from query text
    limit=5,
    query_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="country",
                match=models.MatchValue(value="DE"),
            )
        ]
    ),
)

Re-seeding Embeddings

If provider data changes or the embedding model is updated, re-seed the entire collection:

python -m app.seed_embeddings

This performs an upsert, so existing embeddings for unchanged providers are overwritten (not duplicated).

Deleting the Collection (Emergency)

from qdrant_client import QdrantClient

client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")
client.delete_collection("providers")

Then re-seed:

python -m app.seed_embeddings

Cross-Database Consistency

Since data spans three databases, keep these consistency rules in mind:

Scenario Action
New provider added to PostgreSQL Run python -m app.seed_graph and python -m app.seed_embeddings to update Neo4j and Qdrant
Provider deleted from PostgreSQL Manually remove from Neo4j and Qdrant, or re-seed both
Schema migration changes provider model Re-seed Neo4j and Qdrant after migration
Embedding model changed Re-seed Qdrant (python -m app.seed_embeddings)

Seed Order Matters

Always seed in order: PostgreSQL first (via Alembic + app.seed), then Neo4j (app.seed_graph), then Qdrant (app.seed_embeddings). Downstream seeds read from PostgreSQL.