Skip to content

Data Model

Overview

Curaway's data architecture spans three storage engines, each chosen for its strengths:

  • PostgreSQL -- Relational data, FHIR resources, audit trails, transactional integrity
  • Neo4j -- Graph relationships between patients, conditions, procedures, providers, and doctors
  • Qdrant -- Vector embeddings for semantic search and document matching

All three are kept in sync through application-level writes. Every table and node includes tenant_id for multi-tenancy isolation.


PostgreSQL Schema

Table Inventory (25+ Tables)

Table Purpose Key Relationships
tenants Multi-tenant configuration Root entity
tenant_settings Per-tenant runtime config FK → tenants
patients Patient demographics, preferences FK → tenants
providers Hospital/clinic profiles (42 rows) FK → tenants
doctors Individual doctor profiles (8 rows) FK → providers, tenants
doctor_procedures Doctor-specific procedure metrics FK → doctors
cases Patient treatment cases FK → patients, tenants
conversations Chat threads per case FK → cases
messages Individual chat messages FK → conversations
documents Uploaded medical documents FK → cases, patients
consent Patient consent records FK → patients
fhir_resources FHIR R4 resources as JSONB FK → patients, cases
match_results Provider matching outcomes FK → cases
events Domain event log (observability) FK → tenants
audit_logs Compliance audit trail FK → tenants
notifications Email/SMS/push notification log FK → patients
consultations Scheduled patient-doctor consults FK → cases, doctors
legal_agreements Signed legal documents FK → patients, providers
procedure_requirements Required docs per procedure FK → tenants
provider_procedures Provider-specific procedure info FK → providers
idempotency_keys Request deduplication Standalone
devices Patient device tokens (push) FK → patients
feedback_records Decision-outcome links for AI model improvement FK → cases
provider_facilities Hospital facilities for storefront display FK → providers
treatment_categories Medical treatment categories for storefront navigation FK → tenants

Core Tables Detail

tenants

CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(100) UNIQUE NOT NULL,
    domain VARCHAR(255),
    logo_url TEXT,
    primary_color VARCHAR(7) DEFAULT '#008B8B',
    accent_color VARCHAR(7) DEFAULT '#FF7F50',
    flagsmith_environment_key TEXT,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

patients

CREATE TABLE patients (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    clerk_user_id VARCHAR(255),
    email VARCHAR(255) NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    date_of_birth DATE,
    gender VARCHAR(20),
    country_of_residence VARCHAR(3),  -- ISO 3166-1 alpha-3
    preferred_language VARCHAR(10) DEFAULT 'en',
    phone VARCHAR(20),
    timezone VARCHAR(50),
    medical_history JSONB DEFAULT '{}',
    preferences JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(tenant_id, email)
);

providers

CREATE TABLE providers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(100) NOT NULL,
    country VARCHAR(3) NOT NULL,
    city VARCHAR(100) NOT NULL,
    provider_type VARCHAR(50) DEFAULT 'hospital',
    accreditations JSONB DEFAULT '[]',
    specialties JSONB DEFAULT '[]',
    language_services JSONB DEFAULT '{}',
    facilities JSONB DEFAULT '{}',
    pricing_tier VARCHAR(20),
    website_url TEXT,
    is_active BOOLEAN DEFAULT true,
    neo4j_node_id VARCHAR(100),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(tenant_id, slug)
);

language_services JSONB Structure

{
  "medical_interpreters": ["Hindi", "Arabic", "Turkish"],
  "patient_coordinators": ["English", "Arabic"],
  "third_party_interpretation": true,
  "document_languages": ["English", "Hindi", "Arabic", "Turkish"]
}

doctors Table (Session 26)

The doctors table is the most detailed table in the schema with 36 columns across 8 seed rows. It was designed in Session 26 to support doctor-level matching.

CREATE TABLE doctors (
    -- Identity
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    provider_id UUID NOT NULL REFERENCES providers(id),
    external_id VARCHAR(100),
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    title VARCHAR(50),                    -- Dr., Prof., etc.
    gender VARCHAR(20),
    photo_url TEXT,

    -- Professional
    specialty VARCHAR(100) NOT NULL,
    sub_specialties JSONB DEFAULT '[]',
    medical_license_number VARCHAR(100),
    license_country VARCHAR(3),
    years_of_experience INTEGER,
    education JSONB DEFAULT '[]',         -- [{institution, degree, year}]
    certifications JSONB DEFAULT '[]',

    -- Languages
    languages JSONB DEFAULT '[]',         -- [{language, proficiency}]
    primary_language VARCHAR(50),

    -- Clinical Volume & Outcomes
    annual_procedure_volume INTEGER,
    total_procedures_performed INTEGER,
    overall_success_rate DECIMAL(5,2),
    complication_rate_30day DECIMAL(5,2),
    patient_satisfaction_score DECIMAL(3,2),
    mortality_rate DECIMAL(5,4),

    -- Operational
    consultation_fee_usd DECIMAL(10,2),
    accepts_international_patients BOOLEAN DEFAULT true,
    available_for_video_consult BOOLEAN DEFAULT true,
    average_response_time_hours INTEGER,
    next_available_date DATE,

    -- Social Proof
    publications_count INTEGER DEFAULT 0,
    notable_achievements JSONB DEFAULT '[]',
    patient_testimonials_count INTEGER DEFAULT 0,

    -- Metadata
    data_completeness JSONB DEFAULT '{}',
    badges JSONB DEFAULT '[]',
    is_active BOOLEAN DEFAULT true,
    neo4j_node_id VARCHAR(100),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

Data Completeness Scoring

The data_completeness JSONB column stores a weighted scoring system with 7 categories:

Category Weight Fields Evaluated
identity 0.15 first_name, last_name, title, photo_url, gender
languages 0.10 languages array, primary_language
credentials 0.15 education, certifications, license_number, years_of_experience
volume 0.20 annual_procedure_volume, total_procedures_performed
outcomes 0.25 success_rate, complication_rate, satisfaction_score, mortality_rate
operational 0.10 consultation_fee, video_consult, response_time, next_available
social_proof 0.05 publications_count, notable_achievements, testimonials_count
{
  "identity": {"score": 0.80, "missing": ["photo_url"]},
  "languages": {"score": 1.00, "missing": []},
  "credentials": {"score": 0.75, "missing": ["certifications"]},
  "volume": {"score": 1.00, "missing": []},
  "outcomes": {"score": 1.00, "missing": []},
  "operational": {"score": 0.50, "missing": ["next_available_date", "average_response_time_hours"]},
  "social_proof": {"score": 0.33, "missing": ["notable_achievements", "patient_testimonials_count"]},
  "overall": 0.82
}

Badge System

Badges are assigned based on the overall data completeness score:

Badge Score Range Display
verified >= 0.80 Green shield icon
complete 0.55 - 0.79 Blue checkmark
basic < 0.55 Gray outline

doctor_procedures Table

CREATE TABLE doctor_procedures (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    doctor_id UUID NOT NULL REFERENCES doctors(id),
    procedure_code VARCHAR(20) NOT NULL,     -- CPT code
    procedure_name VARCHAR(255) NOT NULL,
    annual_volume INTEGER,
    success_rate DECIMAL(5,2),
    complication_rate_30day DECIMAL(5,2),
    average_los_days DECIMAL(4,1),           -- Length of stay
    proms_scores JSONB DEFAULT '{}',         -- Patient-Reported Outcome Measures
    technique VARCHAR(100),                  -- e.g., "robotic", "laparoscopic", "open"
    created_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(tenant_id, doctor_id, procedure_code)
);

PROMS Scores

Patient-Reported Outcome Measures are stored as JSONB with procedure-specific instruments:

{
  "oxford_hip_score": 42.5,
  "eq5d_index": 0.85,
  "vas_pain": 2.1
}


Neo4j Graph Model

Node Types

graph TD
    Patient["Patient (dynamic)"]
    Condition["Condition (12)"]
    Procedure["Procedure (12)"]
    Provider["Provider (42)"]
    Doctor["Doctor (8)"]
    DiagnosticTest["DiagnosticTest (25)"]
    Accreditation["Accreditation (7)"]
    RecoveryPhase["RecoveryPhase (7)"]
    Location["Location (22)"]
    Specialty["Specialty"]
    Language["Language"]

    Patient -->|HAS_CONDITION| Condition
    Condition -->|REQUIRES| Procedure
    Provider -->|OFFERS| Procedure
    Procedure -->|REQUIRES_TEST| DiagnosticTest
    Provider -->|HAS_ACCREDITATION| Accreditation
    Procedure -->|HAS_RECOVERY| RecoveryPhase
    Provider -->|LOCATED_IN| Location
    Doctor -->|AFFILIATED_WITH| Provider
    Doctor -->|PERFORMS| Procedure
    Doctor -->|SPECIALIZES_IN| Specialty
    Doctor -->|SPEAKS| Language

    style Patient fill:#008B8B,color:#fff
    style Provider fill:#FF7F50,color:#fff
    style Doctor fill:#FF7F50,color:#fff
    style Condition fill:#4A90D9,color:#fff
    style Procedure fill:#4A90D9,color:#fff

Node Counts (POC Seed Data)

Node Type Count Source
Patient Dynamic Created per patient journey
Condition 12 Seed data (hip replacement, cardiac, dental, etc.)
Procedure 12 Seed data (mapped to conditions)
Provider 42 Seed data (hospitals across 4 countries)
Doctor 8 Seed data (Session 26)
DiagnosticTest 25 Seed data (blood work, imaging, cardiac tests)
Accreditation 7 JCI, NABH, ISO, ACHSI, TEMOS, GCR, GHA
RecoveryPhase 7 Pre-op through long-term follow-up
Location 22 Cities across India, Turkey, Thailand, Spain
Specialty Dynamic Orthopedics, Cardiology, Dental, etc.
Language Dynamic English, Hindi, Turkish, Thai, Spanish, Arabic, etc.

Relationship Properties

The OFFERS relationship carries rich metadata used in matching:

(provider:Provider)-[:OFFERS {
    cost_usd: 12500,
    cost_currency: "USD",
    annual_volume: 450,
    success_rate: 97.2,
    average_los_days: 5,
    wait_time_weeks: 2,
    package_includes: ["surgery", "hospital_stay", "physiotherapy"],
    last_updated: datetime()
}]->(procedure:Procedure)

The PERFORMS relationship connects doctors to procedures:

(doctor:Doctor)-[:PERFORMS {
    annual_volume: 120,
    success_rate: 98.1,
    technique: "robotic-assisted",
    since_year: 2018
}]->(procedure:Procedure)

Key Cypher Queries

Find providers for a condition:

MATCH (c:Condition {code: $condition_code})-[:REQUIRES]->(p:Procedure)
      <-[:OFFERS]-(prov:Provider)-[:LOCATED_IN]->(loc:Location)
WHERE prov.tenant_id = $tenant_id AND prov.is_active = true
RETURN prov, p, loc
ORDER BY prov.offers_success_rate DESC

Find doctors with language concordance:

MATCH (d:Doctor)-[:AFFILIATED_WITH]->(prov:Provider),
      (d)-[:PERFORMS]->(proc:Procedure {code: $procedure_code}),
      (d)-[:SPEAKS]->(lang:Language {code: $patient_language})
WHERE prov.tenant_id = $tenant_id
RETURN d, prov, proc, lang

Qdrant Vector Collections

Collection Summary

Collection Vector Count Dimensions Model Purpose
providers 42 1024 Voyage AI voyage-3.5-lite Semantic provider search
conditions 12 1024 Voyage AI voyage-3.5-lite Condition similarity matching
requirement_embeddings 70 1024 Voyage AI voyage-3.5-lite Document-to-requirement matching
document_embeddings Dynamic 1024 Voyage AI voyage-3.5-lite Uploaded document search

Provider Vectors

Each provider vector encodes a rich text representation:

provider_text = f"""
{provider.name} is a {provider.provider_type} located in {provider.city}, {provider.country}.
Specialties: {', '.join(provider.specialties)}.
Accreditations: {', '.join(provider.accreditations)}.
Language services: {json.dumps(provider.language_services)}.
"""
embedding = voyage_client.embed(provider_text, model="voyage-3.5-lite")

Requirement Embeddings

The requirement_embeddings collection stores 70 vectors representing document requirements for various procedures. Each vector encodes the natural-language description of what the requirement is:

# Example: "Recent blood work within 30 days showing CBC, metabolic panel, and coagulation studies"
# This enables matching uploaded documents to requirements via cosine similarity

Vector Sync

When provider data changes in PostgreSQL, the corresponding Qdrant vector must be re-embedded and upserted. This is handled by the sync_provider_embeddings background task, triggered via QStash.


FHIR R4 Resources

Supported Resource Types

FHIR Resource PostgreSQL Storage Source
Patient fhir_resources (JSONB) Registration + intake
Condition fhir_resources (JSONB) Clinical Context Agent extraction
Procedure fhir_resources (JSONB) Mapped from conditions
Observation fhir_resources (JSONB) Lab results, vitals from documents
DocumentReference fhir_resources (JSONB) Uploaded document metadata
Consent fhir_resources (JSONB) Patient consent records

Storage Schema

CREATE TABLE fhir_resources (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    patient_id UUID NOT NULL REFERENCES patients(id),
    case_id UUID REFERENCES cases(id),
    resource_type VARCHAR(50) NOT NULL,
    resource_id VARCHAR(255) NOT NULL,    -- FHIR resource ID
    resource JSONB NOT NULL,              -- Full FHIR R4 JSON
    version INTEGER DEFAULT 1,
    is_current BOOLEAN DEFAULT true,
    source VARCHAR(50),                   -- 'agent', 'manual', 'import'
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(tenant_id, resource_type, resource_id, version)
);

CREATE INDEX idx_fhir_patient ON fhir_resources(tenant_id, patient_id, resource_type);
CREATE INDEX idx_fhir_resource ON fhir_resources USING GIN(resource);

Validation

All FHIR resources are validated at write time using the fhir.resources Python library:

from fhir.resources.condition import Condition

def validate_fhir_condition(resource_json: dict) -> Condition:
    """Validate and return a typed FHIR Condition resource."""
    return Condition.model_validate(resource_json)

Entity Relationship Overview

erDiagram
    TENANTS ||--o{ PATIENTS : contains
    TENANTS ||--o{ PROVIDERS : contains
    TENANTS ||--o{ DOCTORS : contains
    PATIENTS ||--o{ CASES : creates
    CASES ||--o{ CONVERSATIONS : has
    CONVERSATIONS ||--o{ MESSAGES : contains
    CASES ||--o{ DOCUMENTS : includes
    CASES ||--o{ MATCH_RESULTS : produces
    CASES ||--o{ FHIR_RESOURCES : generates
    PATIENTS ||--o{ CONSENT : grants
    PROVIDERS ||--o{ DOCTORS : employs
    DOCTORS ||--o{ DOCTOR_PROCEDURES : performs
    PROVIDERS ||--o{ PROVIDER_PROCEDURES : offers
    CASES ||--o{ CONSULTATIONS : schedules
    PATIENTS ||--o{ NOTIFICATIONS : receives
    TENANTS ||--o{ EVENTS : logs
    TENANTS ||--o{ AUDIT_LOGS : tracks

Data Flow Summary

sequenceDiagram
    participant Patient
    participant API as FastAPI
    participant PG as PostgreSQL
    participant Neo as Neo4j
    participant QD as Qdrant
    participant R2 as Cloudflare R2

    Patient->>API: Upload document
    API->>R2: Store file (presigned URL)
    API->>PG: Create document record
    API->>PG: Store FHIR resources (JSONB)
    API->>Neo: Create/update Patient node
    API->>Neo: Link Patient → Condition → Procedure
    API->>QD: Embed document → requirement matching
    API->>PG: Store match results
    API->>PG: Log event + audit trail