Database Operations¶
This runbook covers day-to-day database operations for Curaway's three data stores: PostgreSQL (primary), Neo4j (knowledge graph), and Qdrant (vector search).
PostgreSQL¶
PostgreSQL is the primary data store for all relational data: patients, cases, providers, doctors, consent records, and FHIR clinical data.
Alembic Migrations¶
Curaway uses Alembic for schema migrations, integrated with SQLAlchemy async models.
Apply All Pending Migrations¶
This is safe to run repeatedly -- it only applies migrations that have not yet been applied.
Create a New Migration¶
This compares the current SQLAlchemy models against the database schema and generates a migration script in alembic/versions/. Always review the generated file before applying it:
# Review the generated migration
cat alembic/versions/<timestamp>_add_description_of_the_schema_change.py
# Apply it
alembic upgrade head
Check Current Migration State¶
# Show the current revision applied to the database
alembic current
# Show migration history
alembic history --verbose
Roll Back the Last Migration¶
Downgrade Caution
Downgrade scripts are auto-generated and may not perfectly reverse all changes (especially data migrations). Always test downgrades in staging before running in production.
Full Reset (Development Only)¶
Backup and Restore¶
Backup via pg_dump¶
# Full backup
pg_dump "$DATABASE_URL" > backup_$(date +%Y%m%d_%H%M%S).sql
# Schema only
pg_dump "$DATABASE_URL" --schema-only > schema_backup.sql
# Specific table
pg_dump "$DATABASE_URL" -t cases > cases_backup.sql
Restore from Backup¶
Useful Queries¶
Check Table Sizes¶
SELECT
schemaname,
relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
n_live_tup AS row_count
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
Check Active Connections¶
SELECT
pid,
usename,
application_name,
client_addr,
state,
query_start,
query
FROM pg_stat_activity
WHERE datname = 'curaway'
ORDER BY query_start DESC;
Check Pending Migrations (Application-Level)¶
Find Cases by Status¶
SELECT status, COUNT(*) as count
FROM cases
WHERE tenant_id = 'tenant-apollo-001'
GROUP BY status
ORDER BY count DESC;
Neo4j (Knowledge Graph)¶
Neo4j stores the provider-procedure-country knowledge graph used for relationship-based matching.
Seeding the Graph¶
This populates Neo4j with:
- Provider nodes -- Each provider with their specialties and accreditations
- Procedure nodes -- Medical procedures with category hierarchy
- Country nodes -- Countries where providers operate
- Relationships --
PERFORMS(provider-procedure),LOCATED_IN(provider-country),AVAILABLE_IN(procedure-country)
Verifying the Graph¶
Check Node Counts¶
// Total node count by label
MATCH (n)
RETURN labels(n) AS label, COUNT(n) AS count
ORDER BY count DESC;
Expected output (after full seed):
| Label | Count |
|---|---|
| Provider | 42 |
| Procedure | ~30 |
| Country | 8 |
| Doctor | 8 |
Check Relationship Counts¶
// Total relationship count by type
MATCH ()-[r]->()
RETURN type(r) AS relationship, COUNT(r) AS count
ORDER BY count DESC;
Verify a Specific Provider's Graph¶
// Show all relationships for a provider
MATCH (p:Provider {id: 'prov-berlin-ortho-01'})-[r]-(connected)
RETURN p, type(r), connected;
Find Providers for a Procedure in a Country¶
MATCH (p:Provider)-[:PERFORMS]->(proc:Procedure {name: 'Hip Replacement'}),
(p)-[:LOCATED_IN]->(c:Country {code: 'DE'})
RETURN p.name, p.id, c.code;
Clearing and Re-seeding¶
# Clear all nodes and relationships (DESTRUCTIVE)
# Only do this if you intend to re-seed immediately
python -c "
from neo4j import GraphDatabase
driver = GraphDatabase.driver('$NEO4J_URI', auth=('$NEO4J_USERNAME', '$NEO4J_PASSWORD'))
with driver.session() as session:
session.run('MATCH (n) DETACH DELETE n')
driver.close()
"
# Re-seed
python -m app.seed_graph
Qdrant (Vector Search)¶
Qdrant stores embeddings for semantic provider search. Each provider has an embedding generated from their profile description, specialties, and procedures.
Seeding Embeddings¶
This:
- Reads all provider records from PostgreSQL.
- Generates text embeddings using the configured embedding model (default:
text-embedding-3-smallvia OpenAI). - Upserts the embeddings into the Qdrant
providerscollection.
Verifying the Collection¶
Check Collection Info¶
from qdrant_client import QdrantClient
client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")
# Collection info
info = client.get_collection("providers")
print(f"Points: {info.points_count}")
print(f"Vectors: {info.vectors_count}")
print(f"Dimension: {info.config.params.vectors.size}")
Expected:
| Property | Value |
|---|---|
| Points | 42 (one per provider) |
| Dimension | 1536 (text-embedding-3-small) |
| Distance | Cosine |
Test a Semantic Search¶
from qdrant_client import QdrantClient, models
client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")
# Search for orthopedic providers in Germany
results = client.search(
collection_name="providers",
query_vector=embedding_vector, # Generate from query text
limit=5,
query_filter=models.Filter(
must=[
models.FieldCondition(
key="country",
match=models.MatchValue(value="DE"),
)
]
),
)
Re-seeding Embeddings¶
If provider data changes or the embedding model is updated, re-seed the entire collection:
This performs an upsert, so existing embeddings for unchanged providers are overwritten (not duplicated).
Deleting the Collection (Emergency)¶
from qdrant_client import QdrantClient
client = QdrantClient(url="$QDRANT_URL", api_key="$QDRANT_API_KEY")
client.delete_collection("providers")
Then re-seed:
Cross-Database Consistency¶
Since data spans three databases, keep these consistency rules in mind:
| Scenario | Action |
|---|---|
| New provider added to PostgreSQL | Run python -m app.seed_graph and python -m app.seed_embeddings to update Neo4j and Qdrant |
| Provider deleted from PostgreSQL | Manually remove from Neo4j and Qdrant, or re-seed both |
| Schema migration changes provider model | Re-seed Neo4j and Qdrant after migration |
| Embedding model changed | Re-seed Qdrant (python -m app.seed_embeddings) |
Seed Order Matters
Always seed in order: PostgreSQL first (via Alembic + app.seed), then Neo4j (app.seed_graph), then Qdrant (app.seed_embeddings). Downstream seeds read from PostgreSQL.