Skip to content

Graph Projection Architecture

Overview

Curaway uses two data stores for provider/clinical data: PostgreSQL as the canonical source of truth and Neo4j as a read-only projection used by the matching engine. This page documents the projection contract — what gets written where, who writes it, and how drift is detected and corrected.

This architecture was formalized in ADR-0026.


The Projection Contract

graph LR
    subgraph "Write Path"
        Admin[Admin CRUD] --> PG[(PostgreSQL\nsource of truth)]
        Seed[Seed scripts] --> PG
    end

    subgraph "Projection Path"
        PG -->|QStash event| Worker[Projection Worker\napp/jobs/graph_projection.py]
        Worker --> Neo4j[(Neo4j\nread projection)]
        PG -->|Nightly cron\n+ on-demand| Rebuild[Bulk Rebuild\nPOST /admin/graph/rebuild]
        Rebuild --> Neo4j
    end

    subgraph "Read Path"
        Neo4j -->|7-hop traversal| Match[Matching Engine\napp/services/matching_engine.py]
    end

    style PG fill:#004D4D,color:#fff
    style Neo4j fill:#008B8B,color:#fff
    style Worker fill:#FF7F50,color:#fff

Key rule: Neo4j is never written to directly from request handlers. Only the projection worker (event-driven) and the bulk rebuild endpoint write to Neo4j.

Anything in Neo4j must be reconstructible from Postgres + reference YAML. If a field exists only in Neo4j, that is a bug.


What Gets Projected

Entities

Postgres Table Neo4j Node Label Projection Function
providers Provider app/services/graph/provider_graph.py::create_provider_node
doctors Doctor app/services/graph/provider_graph.py::create_doctor_node
procedures Procedure app/services/graph/clinical_graph.py::create_procedure_node
conditions (ref) Condition app/services/graph/clinical_graph.py::create_condition_node
accreditations Accreditation app/services/graph/provider_graph.py::create_accreditation_node

Relationships

Relationship Direction Cypher Label Source Table
Provider performs procedure Provider → Procedure PERFORMS provider_procedure_offerings
Provider located in country Provider → Country LOCATED_IN providers.country_code
Doctor works at provider Doctor → Provider WORKS_AT doctor_provider_affiliations
Doctor speaks language Doctor → Language SPEAKS doctor_languages
Doctor trained in specialty Doctor → Specialty SPECIALIZES_IN doctor_specialties
Condition implies procedure Condition → Procedure IMPLIES reference YAML

Reference Data (YAML → Neo4j only)

Reference data that does not exist in Postgres is loaded from config/reference/*.yaml:

  • Visa corridors (Country → Country VISA_FREE)
  • Climate/air quality per country
  • Country tier classification

These are loaded exclusively during bulk rebuild — not event-driven.


Event-Driven Projection

For real-time consistency, every Postgres write that affects a graph entity emits a QStash event. The projection worker consumes the event and upserts the Neo4j subgraph idempotently.

Event flow

Admin form submit
  → Repository write (Postgres commits)
  → QStash enqueue: graph.provider.upsert / graph.doctor.upsert / graph.procedure.upsert
  → Projection worker processes event
  → Neo4j upserted (~30s latency from PG commit)

The admin onboarding form returns 200 as soon as Postgres commits. The projection lands asynchronously. The admin UI can poll /admin/graph/rebuild status or use SSE to display a "matchable" badge once projection is confirmed.

Current implementation status

As of ADR-0026 (2026-05-07): the event-driven projection worker is planned for Phase 1 (Issue #767). The current production path uses bulk rebuild only.


Bulk Rebuild

The bulk rebuild endpoint reads all relevant Postgres tables and re-projects the full Neo4j subgraph. It is the primary consistency mechanism until the event-driven worker ships.

Endpoint

POST /api/v1/admin/graph/rebuild
Permission: graph:rebuild (held by super_admin by default)

Request body:

{
  "scope": "all"        // or "providers" | "doctors" | "procedures"
}

Response:

{
  "counts": {
    "doctors": 172,
    "providers": 42,
    "procedures": 30
  },
  "duration_ms": 4210
}

When to trigger

  • After bulk provider or doctor imports
  • After a Postgres schema migration that adds new graph-relevant fields
  • When the Neo4j node count diverges from Postgres (see verification steps below)
  • Nightly cron (to be wired via QStash — currently manual)

See the Database Operations Runbook for step-by-step rebuild instructions.


Graph Module Structure

The graph service is split into domain-focused sub-modules under app/services/graph/:

Module Responsibility
app/services/graph/driver.py Neo4j driver singleton, connection management
app/services/graph/provider_graph.py Provider + Doctor node/relationship writes
app/services/graph/clinical_graph.py Condition + Procedure node writes
app/services/graph/matching_graph.py Read queries used by the matching engine
app/services/graph/procedure_requirements.py Pre-op requirement lookups

app/services/graph_service.py is a backward-compatibility shim that re-exports all symbols. New code should import directly from sub-modules.


Verifying Projection Integrity

Check node counts match Postgres

-- Neo4j
MATCH (n)
RETURN labels(n) AS label, COUNT(n) AS count
ORDER BY count DESC;

Compare against Postgres:

-- Postgres
SELECT 'providers' AS entity, COUNT(*) FROM providers WHERE is_active = true
UNION ALL
SELECT 'doctors', COUNT(*) FROM doctors WHERE is_active = true
UNION ALL
SELECT 'procedures', COUNT(*) FROM procedures;

A discrepancy means a projection event was dropped. Trigger a bulk rebuild.

Check relationship integrity

-- Providers with no PERFORMS relationship (should be 0 for active providers)
MATCH (p:Provider)
WHERE NOT (p)-[:PERFORMS]->()
RETURN p.name, p.provider_id;

Current Known Gap (Phase 0 → Phase 1)

As documented in ADR-0026: 172 doctors are seeded in Postgres via seed_doctors_full.py but the create_doctor_node / link_doctor_* functions were never called from the seed path. The bulk rebuild endpoint created in Phase 0 (Issue #766, PR #770) corrects this — rebuilding the graph now projects all 172 doctors.

The 7-hop traversal (Patient → Condition → Procedure → Provider → Outcome → Cost → Location) that the matching spec requires depends on the Doctor nodes being projected. This is now unblocked by the Phase 0 work.