Graph Projection Architecture¶
Overview¶
Curaway uses two data stores for provider/clinical data: PostgreSQL as the canonical source of truth and Neo4j as a read-only projection used by the matching engine. This page documents the projection contract — what gets written where, who writes it, and how drift is detected and corrected.
This architecture was formalized in ADR-0026.
The Projection Contract¶
graph LR
subgraph "Write Path"
Admin[Admin CRUD] --> PG[(PostgreSQL\nsource of truth)]
Seed[Seed scripts] --> PG
end
subgraph "Projection Path"
PG -->|QStash event| Worker[Projection Worker\napp/jobs/graph_projection.py]
Worker --> Neo4j[(Neo4j\nread projection)]
PG -->|Nightly cron\n+ on-demand| Rebuild[Bulk Rebuild\nPOST /admin/graph/rebuild]
Rebuild --> Neo4j
end
subgraph "Read Path"
Neo4j -->|7-hop traversal| Match[Matching Engine\napp/services/matching_engine.py]
end
style PG fill:#004D4D,color:#fff
style Neo4j fill:#008B8B,color:#fff
style Worker fill:#FF7F50,color:#fff
Key rule: Neo4j is never written to directly from request handlers. Only the projection worker (event-driven) and the bulk rebuild endpoint write to Neo4j.
Anything in Neo4j must be reconstructible from Postgres + reference YAML. If a field exists only in Neo4j, that is a bug.
What Gets Projected¶
Entities¶
| Postgres Table | Neo4j Node Label | Projection Function |
|---|---|---|
providers |
Provider |
app/services/graph/provider_graph.py::create_provider_node |
doctors |
Doctor |
app/services/graph/provider_graph.py::create_doctor_node |
procedures |
Procedure |
app/services/graph/clinical_graph.py::create_procedure_node |
conditions (ref) |
Condition |
app/services/graph/clinical_graph.py::create_condition_node |
accreditations |
Accreditation |
app/services/graph/provider_graph.py::create_accreditation_node |
Relationships¶
| Relationship | Direction | Cypher Label | Source Table |
|---|---|---|---|
| Provider performs procedure | Provider → Procedure |
PERFORMS |
provider_procedure_offerings |
| Provider located in country | Provider → Country |
LOCATED_IN |
providers.country_code |
| Doctor works at provider | Doctor → Provider |
WORKS_AT |
doctor_provider_affiliations |
| Doctor speaks language | Doctor → Language |
SPEAKS |
doctor_languages |
| Doctor trained in specialty | Doctor → Specialty |
SPECIALIZES_IN |
doctor_specialties |
| Condition implies procedure | Condition → Procedure |
IMPLIES |
reference YAML |
Reference Data (YAML → Neo4j only)¶
Reference data that does not exist in Postgres is loaded from config/reference/*.yaml:
- Visa corridors (
Country → CountryVISA_FREE) - Climate/air quality per country
- Country tier classification
These are loaded exclusively during bulk rebuild — not event-driven.
Event-Driven Projection¶
For real-time consistency, every Postgres write that affects a graph entity emits a QStash event. The projection worker consumes the event and upserts the Neo4j subgraph idempotently.
Event flow¶
Admin form submit
→ Repository write (Postgres commits)
→ QStash enqueue: graph.provider.upsert / graph.doctor.upsert / graph.procedure.upsert
→ Projection worker processes event
→ Neo4j upserted (~30s latency from PG commit)
The admin onboarding form returns 200 as soon as Postgres commits. The projection lands asynchronously. The admin UI can poll /admin/graph/rebuild status or use SSE to display a "matchable" badge once projection is confirmed.
Current implementation status¶
As of ADR-0026 (2026-05-07): the event-driven projection worker is planned for Phase 1 (Issue #767). The current production path uses bulk rebuild only.
Bulk Rebuild¶
The bulk rebuild endpoint reads all relevant Postgres tables and re-projects the full Neo4j subgraph. It is the primary consistency mechanism until the event-driven worker ships.
Endpoint¶
Request body:
Response:
When to trigger¶
- After bulk provider or doctor imports
- After a Postgres schema migration that adds new graph-relevant fields
- When the Neo4j node count diverges from Postgres (see verification steps below)
- Nightly cron (to be wired via QStash — currently manual)
See the Database Operations Runbook for step-by-step rebuild instructions.
Graph Module Structure¶
The graph service is split into domain-focused sub-modules under app/services/graph/:
| Module | Responsibility |
|---|---|
app/services/graph/driver.py |
Neo4j driver singleton, connection management |
app/services/graph/provider_graph.py |
Provider + Doctor node/relationship writes |
app/services/graph/clinical_graph.py |
Condition + Procedure node writes |
app/services/graph/matching_graph.py |
Read queries used by the matching engine |
app/services/graph/procedure_requirements.py |
Pre-op requirement lookups |
app/services/graph_service.py is a backward-compatibility shim that re-exports all symbols. New code should import directly from sub-modules.
Verifying Projection Integrity¶
Check node counts match Postgres¶
Compare against Postgres:
-- Postgres
SELECT 'providers' AS entity, COUNT(*) FROM providers WHERE is_active = true
UNION ALL
SELECT 'doctors', COUNT(*) FROM doctors WHERE is_active = true
UNION ALL
SELECT 'procedures', COUNT(*) FROM procedures;
A discrepancy means a projection event was dropped. Trigger a bulk rebuild.
Check relationship integrity¶
-- Providers with no PERFORMS relationship (should be 0 for active providers)
MATCH (p:Provider)
WHERE NOT (p)-[:PERFORMS]->()
RETURN p.name, p.provider_id;
Current Known Gap (Phase 0 → Phase 1)¶
As documented in ADR-0026: 172 doctors are seeded in Postgres via seed_doctors_full.py but the create_doctor_node / link_doctor_* functions were never called from the seed path. The bulk rebuild endpoint created in Phase 0 (Issue #766, PR #770) corrects this — rebuilding the graph now projects all 172 doctors.
The 7-hop traversal (Patient → Condition → Procedure → Provider → Outcome → Cost → Location) that the matching spec requires depends on the Doctor nodes being projected. This is now unblocked by the Phase 0 work.