Graph Projection Architecture¶

Overview¶

Curaway uses two data stores for provider/clinical data: PostgreSQL as the canonical source of truth and Neo4j as a read-only projection used by the matching engine. This page documents the projection contract — what gets written where, who writes it, and how drift is detected and corrected.

This architecture was formalized in ADR-0026.

The Projection Contract¶

graph LR
    subgraph "Write Path"
        Admin[Admin CRUD] --> PG[(PostgreSQL\nsource of truth)]
        Seed[Seed scripts] --> PG
    end

    subgraph "Projection Path"
        PG -->|QStash event| Worker[Projection Worker\napp/jobs/graph_projection.py]
        Worker --> Neo4j[(Neo4j\nread projection)]
        PG -->|Nightly cron\n+ on-demand| Rebuild[Bulk Rebuild\nPOST /admin/graph/rebuild]
        Rebuild --> Neo4j
    end

    subgraph "Read Path"
        Neo4j -->|7-hop traversal| Match[Matching Engine\napp/services/matching_engine.py]
    end

    style PG fill:#004D4D,color:#fff
    style Neo4j fill:#008B8B,color:#fff
    style Worker fill:#FF7F50,color:#fff

Key rule: Neo4j is never written to directly from request handlers. Only the projection worker (event-driven) and the bulk rebuild endpoint write to Neo4j.

Anything in Neo4j must be reconstructible from Postgres + reference YAML. If a field exists only in Neo4j, that is a bug.

What Gets Projected¶

Entities¶

Postgres Table	Neo4j Node Label	Projection Function
`providers`	`Provider`	`app/services/graph/provider_graph.py::create_provider_node`
`doctors`	`Doctor`	`app/services/graph/provider_graph.py::create_doctor_node`
`procedures`	`Procedure`	`app/services/graph/clinical_graph.py::create_procedure_node`
`conditions` (ref)	`Condition`	`app/services/graph/clinical_graph.py::create_condition_node`
`accreditations`	`Accreditation`	`app/services/graph/provider_graph.py::create_accreditation_node`

Relationships¶

Relationship	Direction	Cypher Label	Source Table
Provider performs procedure	`Provider → Procedure`	`PERFORMS`	`provider_procedure_offerings`
Provider located in country	`Provider → Country`	`LOCATED_IN`	`providers.country_code`
Doctor works at provider	`Doctor → Provider`	`WORKS_AT`	`doctor_provider_affiliations`
Doctor speaks language	`Doctor → Language`	`SPEAKS`	`doctor_languages`
Doctor trained in specialty	`Doctor → Specialty`	`SPECIALIZES_IN`	`doctor_specialties`
Condition implies procedure	`Condition → Procedure`	`IMPLIES`	reference YAML

Reference Data (YAML → Neo4j only)¶

Reference data that does not exist in Postgres is loaded from config/reference/*.yaml:

Visa corridors (Country → Country VISA_FREE)
Climate/air quality per country
Country tier classification

These are loaded exclusively during bulk rebuild — not event-driven.

Event-Driven Projection¶

For real-time consistency, every Postgres write that affects a graph entity emits a QStash event. The projection worker consumes the event and upserts the Neo4j subgraph idempotently.

Event flow¶

Admin form submit
  → Repository write (Postgres commits)
  → QStash enqueue: graph.provider.upsert / graph.doctor.upsert / graph.procedure.upsert
  → Projection worker processes event
  → Neo4j upserted (~30s latency from PG commit)

The admin onboarding form returns 200 as soon as Postgres commits. The projection lands asynchronously. The admin UI can poll /admin/graph/rebuild status or use SSE to display a "matchable" badge once projection is confirmed.

Current implementation status¶

As of ADR-0026 (2026-05-07): the event-driven projection worker is planned for Phase 1 (Issue #767). The current production path uses bulk rebuild only.

Bulk Rebuild¶

The bulk rebuild endpoint reads all relevant Postgres tables and re-projects the full Neo4j subgraph. It is the primary consistency mechanism until the event-driven worker ships.

Endpoint¶

POST /api/v1/admin/graph/rebuild
Permission: graph:rebuild (held by super_admin by default)

Request body:

{
  "scope": "all"        // or "providers" | "doctors" | "procedures"
}

Response:

{
  "counts": {
    "doctors": 172,
    "providers": 42,
    "procedures": 30
  },
  "duration_ms": 4210
}

When to trigger¶

After bulk provider or doctor imports
After a Postgres schema migration that adds new graph-relevant fields
When the Neo4j node count diverges from Postgres (see verification steps below)
Nightly cron (to be wired via QStash — currently manual)

See the Database Operations Runbook for step-by-step rebuild instructions.

Graph Module Structure¶

The graph service is split into domain-focused sub-modules under app/services/graph/:

Module	Responsibility
`app/services/graph/driver.py`	Neo4j driver singleton, connection management
`app/services/graph/provider_graph.py`	Provider + Doctor node/relationship writes
`app/services/graph/clinical_graph.py`	Condition + Procedure node writes
`app/services/graph/matching_graph.py`	Read queries used by the matching engine
`app/services/graph/procedure_requirements.py`	Pre-op requirement lookups

app/services/graph_service.py is a backward-compatibility shim that re-exports all symbols. New code should import directly from sub-modules.

Verifying Projection Integrity¶

Check node counts match Postgres¶

-- Neo4j
MATCH (n)
RETURN labels(n) AS label, COUNT(n) AS count
ORDER BY count DESC;

Compare against Postgres:

-- Postgres
SELECT 'providers' AS entity, COUNT(*) FROM providers WHERE is_active = true
UNION ALL
SELECT 'doctors', COUNT(*) FROM doctors WHERE is_active = true
UNION ALL
SELECT 'procedures', COUNT(*) FROM procedures;

A discrepancy means a projection event was dropped. Trigger a bulk rebuild.

Check relationship integrity¶

-- Providers with no PERFORMS relationship (should be 0 for active providers)
MATCH (p:Provider)
WHERE NOT (p)-[:PERFORMS]->()
RETURN p.name, p.provider_id;

Current Known Gap (Phase 0 → Phase 1)¶

As documented in ADR-0026: 172 doctors are seeded in Postgres via seed_doctors_full.py but the create_doctor_node / link_doctor_* functions were never called from the seed path. The bulk rebuild endpoint created in Phase 0 (Issue #766, PR #770) corrects this — rebuilding the graph now projects all 172 doctors.

The 7-hop traversal (Patient → Condition → Procedure → Provider → Outcome → Cost → Location) that the matching spec requires depends on the Doctor nodes being projected. This is now unblocked by the Phase 0 work.

ADR-0026 — Matching framework architecture
Matching Engine architecture
Database Operations runbook — rebuild steps
Admin Endpoints reference