Data Model¶

Overview¶

Curaway's data architecture spans three storage engines, each chosen for its strengths:

PostgreSQL -- Relational data, FHIR resources, audit trails, transactional integrity
Neo4j -- Graph relationships between patients, conditions, procedures, providers, and doctors
Qdrant -- Vector embeddings for semantic search and document matching

All three are kept in sync through application-level writes. Every table and node includes tenant_id for multi-tenancy isolation.

PostgreSQL Schema¶

Table Inventory (25+ Tables)¶

Table	Purpose	Key Relationships
`tenants`	Multi-tenant configuration	Root entity
`tenant_settings`	Per-tenant runtime config	FK → tenants
`patients`	Patient demographics, preferences	FK → tenants
`providers`	Hospital/clinic profiles (42 rows)	FK → tenants
`doctors`	Individual doctor profiles (8 rows)	FK → providers, tenants
`doctor_procedures`	Doctor-specific procedure metrics	FK → doctors
`cases`	Patient treatment cases	FK → patients, tenants
`conversations`	Chat threads per case	FK → cases
`messages`	Individual chat messages	FK → conversations
`documents`	Uploaded medical documents	FK → cases, patients
`consent`	Patient consent records	FK → patients
`fhir_resources`	FHIR R4 resources as JSONB	FK → patients, cases
`match_results`	Provider matching outcomes	FK → cases
`events`	Domain event log (observability)	FK → tenants
`audit_logs`	Compliance audit trail	FK → tenants
`notifications`	Email/SMS/push notification log	FK → patients
`consultations`	Scheduled patient-doctor consults	FK → cases, doctors
`legal_agreements`	Signed legal documents	FK → patients, providers
`procedure_requirements`	Required docs per procedure	FK → tenants
`provider_procedures`	Provider-specific procedure info	FK → providers
`idempotency_keys`	Request deduplication	Standalone
`devices`	Patient device tokens (push)	FK → patients
`feedback_records`	Decision-outcome links for AI model improvement	FK → cases
`provider_facilities`	Hospital facilities for storefront display	FK → providers
`treatment_categories`	Medical treatment categories for storefront navigation	FK → tenants

Core Tables Detail¶

tenants¶

CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(100) UNIQUE NOT NULL,
    domain VARCHAR(255),
    logo_url TEXT,
    primary_color VARCHAR(7) DEFAULT '#008B8B',
    accent_color VARCHAR(7) DEFAULT '#FF7F50',
    flagsmith_environment_key TEXT,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

patients¶

CREATE TABLE patients (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    clerk_user_id VARCHAR(255),
    email VARCHAR(255) NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    date_of_birth DATE,
    gender VARCHAR(20),
    country_of_residence VARCHAR(3),  -- ISO 3166-1 alpha-3
    preferred_language VARCHAR(10) DEFAULT 'en',
    phone VARCHAR(20),
    timezone VARCHAR(50),
    medical_history JSONB DEFAULT '{}',
    preferences JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(tenant_id, email)
);

providers¶

CREATE TABLE providers (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(100) NOT NULL,
    country VARCHAR(3) NOT NULL,
    city VARCHAR(100) NOT NULL,
    provider_type VARCHAR(50) DEFAULT 'hospital',
    accreditations JSONB DEFAULT '[]',
    specialties JSONB DEFAULT '[]',
    language_services JSONB DEFAULT '{}',
    facilities JSONB DEFAULT '{}',
    pricing_tier VARCHAR(20),
    website_url TEXT,
    is_active BOOLEAN DEFAULT true,
    neo4j_node_id VARCHAR(100),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(tenant_id, slug)
);

language_services JSONB Structure

{
  "medical_interpreters": ["Hindi", "Arabic", "Turkish"],
  "patient_coordinators": ["English", "Arabic"],
  "third_party_interpretation": true,
  "document_languages": ["English", "Hindi", "Arabic", "Turkish"]
}

doctors Table (Session 26)¶

The doctors table is the most detailed table in the schema with 36 columns across 8 seed rows. It was designed in Session 26 to support doctor-level matching.

CREATE TABLE doctors (
    -- Identity
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    provider_id UUID NOT NULL REFERENCES providers(id),
    external_id VARCHAR(100),
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    title VARCHAR(50),                    -- Dr., Prof., etc.
    gender VARCHAR(20),
    photo_url TEXT,

    -- Professional
    specialty VARCHAR(100) NOT NULL,
    sub_specialties JSONB DEFAULT '[]',
    medical_license_number VARCHAR(100),
    license_country VARCHAR(3),
    years_of_experience INTEGER,
    education JSONB DEFAULT '[]',         -- [{institution, degree, year}]
    certifications JSONB DEFAULT '[]',

    -- Languages
    languages JSONB DEFAULT '[]',         -- [{language, proficiency}]
    primary_language VARCHAR(50),

    -- Clinical Volume & Outcomes
    annual_procedure_volume INTEGER,
    total_procedures_performed INTEGER,
    overall_success_rate DECIMAL(5,2),
    complication_rate_30day DECIMAL(5,2),
    patient_satisfaction_score DECIMAL(3,2),
    mortality_rate DECIMAL(5,4),

    -- Operational
    consultation_fee_usd DECIMAL(10,2),
    accepts_international_patients BOOLEAN DEFAULT true,
    available_for_video_consult BOOLEAN DEFAULT true,
    average_response_time_hours INTEGER,
    next_available_date DATE,

    -- Social Proof
    publications_count INTEGER DEFAULT 0,
    notable_achievements JSONB DEFAULT '[]',
    patient_testimonials_count INTEGER DEFAULT 0,

    -- Metadata
    data_completeness JSONB DEFAULT '{}',
    badges JSONB DEFAULT '[]',
    is_active BOOLEAN DEFAULT true,
    neo4j_node_id VARCHAR(100),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

Data Completeness Scoring¶

The data_completeness JSONB column stores a weighted scoring system with 7 categories:

Category	Weight	Fields Evaluated
identity	0.15	first_name, last_name, title, photo_url, gender
languages	0.10	languages array, primary_language
credentials	0.15	education, certifications, license_number, years_of_experience
volume	0.20	annual_procedure_volume, total_procedures_performed
outcomes	0.25	success_rate, complication_rate, satisfaction_score, mortality_rate
operational	0.10	consultation_fee, video_consult, response_time, next_available
social_proof	0.05	publications_count, notable_achievements, testimonials_count

{
  "identity": {"score": 0.80, "missing": ["photo_url"]},
  "languages": {"score": 1.00, "missing": []},
  "credentials": {"score": 0.75, "missing": ["certifications"]},
  "volume": {"score": 1.00, "missing": []},
  "outcomes": {"score": 1.00, "missing": []},
  "operational": {"score": 0.50, "missing": ["next_available_date", "average_response_time_hours"]},
  "social_proof": {"score": 0.33, "missing": ["notable_achievements", "patient_testimonials_count"]},
  "overall": 0.82
}

Badge System¶

Badges are assigned based on the overall data completeness score:

Badge	Score Range	Display
`verified`	>= 0.80	Green shield icon
`complete`	0.55 - 0.79	Blue checkmark
`basic`	< 0.55	Gray outline

doctor_procedures Table¶

CREATE TABLE doctor_procedures (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    doctor_id UUID NOT NULL REFERENCES doctors(id),
    procedure_code VARCHAR(20) NOT NULL,     -- CPT code
    procedure_name VARCHAR(255) NOT NULL,
    annual_volume INTEGER,
    success_rate DECIMAL(5,2),
    complication_rate_30day DECIMAL(5,2),
    average_los_days DECIMAL(4,1),           -- Length of stay
    proms_scores JSONB DEFAULT '{}',         -- Patient-Reported Outcome Measures
    technique VARCHAR(100),                  -- e.g., "robotic", "laparoscopic", "open"
    created_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(tenant_id, doctor_id, procedure_code)
);

PROMS Scores

Patient-Reported Outcome Measures are stored as JSONB with procedure-specific instruments:

{
  "oxford_hip_score": 42.5,
  "eq5d_index": 0.85,
  "vas_pain": 2.1
}

Neo4j Graph Model¶

Node Types¶

graph TD
    Patient["Patient (dynamic)"]
    Condition["Condition (12)"]
    Procedure["Procedure (12)"]
    Provider["Provider (42)"]
    Doctor["Doctor (8)"]
    DiagnosticTest["DiagnosticTest (25)"]
    Accreditation["Accreditation (7)"]
    RecoveryPhase["RecoveryPhase (7)"]
    Location["Location (22)"]
    Specialty["Specialty"]
    Language["Language"]

    Patient -->|HAS_CONDITION| Condition
    Condition -->|REQUIRES| Procedure
    Provider -->|OFFERS| Procedure
    Procedure -->|REQUIRES_TEST| DiagnosticTest
    Provider -->|HAS_ACCREDITATION| Accreditation
    Procedure -->|HAS_RECOVERY| RecoveryPhase
    Provider -->|LOCATED_IN| Location
    Doctor -->|AFFILIATED_WITH| Provider
    Doctor -->|PERFORMS| Procedure
    Doctor -->|SPECIALIZES_IN| Specialty
    Doctor -->|SPEAKS| Language

    style Patient fill:#008B8B,color:#fff
    style Provider fill:#FF7F50,color:#fff
    style Doctor fill:#FF7F50,color:#fff
    style Condition fill:#4A90D9,color:#fff
    style Procedure fill:#4A90D9,color:#fff

Node Counts (POC Seed Data)¶

Node Type	Count	Source
Patient	Dynamic	Created per patient journey
Condition	12	Seed data (hip replacement, cardiac, dental, etc.)
Procedure	12	Seed data (mapped to conditions)
Provider	42	Seed data (hospitals across 4 countries)
Doctor	8	Seed data (Session 26)
DiagnosticTest	25	Seed data (blood work, imaging, cardiac tests)
Accreditation	7	JCI, NABH, ISO, ACHSI, TEMOS, GCR, GHA
RecoveryPhase	7	Pre-op through long-term follow-up
Location	22	Cities across India, Turkey, Thailand, Spain
Specialty	Dynamic	Orthopedics, Cardiology, Dental, etc.
Language	Dynamic	English, Hindi, Turkish, Thai, Spanish, Arabic, etc.

Relationship Properties¶

The OFFERS relationship carries rich metadata used in matching:

(provider:Provider)-[:OFFERS {
    cost_usd: 12500,
    cost_currency: "USD",
    annual_volume: 450,
    success_rate: 97.2,
    average_los_days: 5,
    wait_time_weeks: 2,
    package_includes: ["surgery", "hospital_stay", "physiotherapy"],
    last_updated: datetime()
}]->(procedure:Procedure)

The PERFORMS relationship connects doctors to procedures:

(doctor:Doctor)-[:PERFORMS {
    annual_volume: 120,
    success_rate: 98.1,
    technique: "robotic-assisted",
    since_year: 2018
}]->(procedure:Procedure)

Key Cypher Queries¶

Find providers for a condition:

MATCH (c:Condition {code: $condition_code})-[:REQUIRES]->(p:Procedure)
      <-[:OFFERS]-(prov:Provider)-[:LOCATED_IN]->(loc:Location)
WHERE prov.tenant_id = $tenant_id AND prov.is_active = true
RETURN prov, p, loc
ORDER BY prov.offers_success_rate DESC

Find doctors with language concordance:

MATCH (d:Doctor)-[:AFFILIATED_WITH]->(prov:Provider),
      (d)-[:PERFORMS]->(proc:Procedure {code: $procedure_code}),
      (d)-[:SPEAKS]->(lang:Language {code: $patient_language})
WHERE prov.tenant_id = $tenant_id
RETURN d, prov, proc, lang

Qdrant Vector Collections¶

Collection Summary¶

Collection	Vector Count	Dimensions	Model	Purpose
`providers`	42	1024	Voyage AI voyage-3.5-lite	Semantic provider search
`conditions`	12	1024	Voyage AI voyage-3.5-lite	Condition similarity matching
`requirement_embeddings`	70	1024	Voyage AI voyage-3.5-lite	Document-to-requirement matching
`document_embeddings`	Dynamic	1024	Voyage AI voyage-3.5-lite	Uploaded document search

Provider Vectors¶

Each provider vector encodes a rich text representation:

provider_text = f"""
{provider.name} is a {provider.provider_type} located in {provider.city}, {provider.country}.
Specialties: {', '.join(provider.specialties)}.
Accreditations: {', '.join(provider.accreditations)}.
Language services: {json.dumps(provider.language_services)}.
"""
embedding = voyage_client.embed(provider_text, model="voyage-3.5-lite")

Requirement Embeddings¶

The requirement_embeddings collection stores 70 vectors representing document requirements for various procedures. Each vector encodes the natural-language description of what the requirement is:

# Example: "Recent blood work within 30 days showing CBC, metabolic panel, and coagulation studies"
# This enables matching uploaded documents to requirements via cosine similarity

Vector Sync

When provider data changes in PostgreSQL, the corresponding Qdrant vector must be re-embedded and upserted. This is handled by the sync_provider_embeddings background task, triggered via QStash.

FHIR R4 Resources¶

Supported Resource Types¶

FHIR Resource	PostgreSQL Storage	Source
`Patient`	`fhir_resources` (JSONB)	Registration + intake
`Condition`	`fhir_resources` (JSONB)	Clinical Context Agent extraction
`Procedure`	`fhir_resources` (JSONB)	Mapped from conditions
`Observation`	`fhir_resources` (JSONB)	Lab results, vitals from documents
`DocumentReference`	`fhir_resources` (JSONB)	Uploaded document metadata
`Consent`	`fhir_resources` (JSONB)	Patient consent records

Storage Schema¶

CREATE TABLE fhir_resources (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    patient_id UUID NOT NULL REFERENCES patients(id),
    case_id UUID REFERENCES cases(id),
    resource_type VARCHAR(50) NOT NULL,
    resource_id VARCHAR(255) NOT NULL,    -- FHIR resource ID
    resource JSONB NOT NULL,              -- Full FHIR R4 JSON
    version INTEGER DEFAULT 1,
    is_current BOOLEAN DEFAULT true,
    source VARCHAR(50),                   -- 'agent', 'manual', 'import'
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(tenant_id, resource_type, resource_id, version)
);

CREATE INDEX idx_fhir_patient ON fhir_resources(tenant_id, patient_id, resource_type);
CREATE INDEX idx_fhir_resource ON fhir_resources USING GIN(resource);

Validation¶

All FHIR resources are validated at write time using the fhir.resources Python library:

from fhir.resources.condition import Condition

def validate_fhir_condition(resource_json: dict) -> Condition:
    """Validate and return a typed FHIR Condition resource."""
    return Condition.model_validate(resource_json)

Entity Relationship Overview¶

erDiagram
    TENANTS ||--o{ PATIENTS : contains
    TENANTS ||--o{ PROVIDERS : contains
    TENANTS ||--o{ DOCTORS : contains
    PATIENTS ||--o{ CASES : creates
    CASES ||--o{ CONVERSATIONS : has
    CONVERSATIONS ||--o{ MESSAGES : contains
    CASES ||--o{ DOCUMENTS : includes
    CASES ||--o{ MATCH_RESULTS : produces
    CASES ||--o{ FHIR_RESOURCES : generates
    PATIENTS ||--o{ CONSENT : grants
    PROVIDERS ||--o{ DOCTORS : employs
    DOCTORS ||--o{ DOCTOR_PROCEDURES : performs
    PROVIDERS ||--o{ PROVIDER_PROCEDURES : offers
    CASES ||--o{ CONSULTATIONS : schedules
    PATIENTS ||--o{ NOTIFICATIONS : receives
    TENANTS ||--o{ EVENTS : logs
    TENANTS ||--o{ AUDIT_LOGS : tracks

Data Flow Summary¶

sequenceDiagram
    participant Patient
    participant API as FastAPI
    participant PG as PostgreSQL
    participant Neo as Neo4j
    participant QD as Qdrant
    participant R2 as Cloudflare R2

    Patient->>API: Upload document
    API->>R2: Store file (presigned URL)
    API->>PG: Create document record
    API->>PG: Store FHIR resources (JSONB)
    API->>Neo: Create/update Patient node
    API->>Neo: Link Patient → Condition → Procedure
    API->>QD: Embed document → requirement matching
    API->>PG: Store match results
    API->>PG: Log event + audit trail