Data Model¶
Overview¶
Curaway's data architecture spans three storage engines, each chosen for its strengths:
- PostgreSQL -- Relational data, FHIR resources, audit trails, transactional integrity
- Neo4j -- Graph relationships between patients, conditions, procedures, providers, and doctors
- Qdrant -- Vector embeddings for semantic search and document matching
All three are kept in sync through application-level writes. Every table and node includes tenant_id for multi-tenancy isolation.
PostgreSQL Schema¶
Table Inventory (25+ Tables)¶
| Table | Purpose | Key Relationships |
|---|---|---|
tenants |
Multi-tenant configuration | Root entity |
tenant_settings |
Per-tenant runtime config | FK → tenants |
patients |
Patient demographics, preferences | FK → tenants |
providers |
Hospital/clinic profiles (42 rows) | FK → tenants |
doctors |
Individual doctor profiles (8 rows) | FK → providers, tenants |
doctor_procedures |
Doctor-specific procedure metrics | FK → doctors |
cases |
Patient treatment cases | FK → patients, tenants |
conversations |
Chat threads per case | FK → cases |
messages |
Individual chat messages | FK → conversations |
documents |
Uploaded medical documents | FK → cases, patients |
consent |
Patient consent records | FK → patients |
fhir_resources |
FHIR R4 resources as JSONB | FK → patients, cases |
match_results |
Provider matching outcomes | FK → cases |
events |
Domain event log (observability) | FK → tenants |
audit_logs |
Compliance audit trail | FK → tenants |
notifications |
Email/SMS/push notification log | FK → patients |
consultations |
Scheduled patient-doctor consults | FK → cases, doctors |
legal_agreements |
Signed legal documents | FK → patients, providers |
procedure_requirements |
Required docs per procedure | FK → tenants |
provider_procedures |
Provider-specific procedure info | FK → providers |
idempotency_keys |
Request deduplication | Standalone |
devices |
Patient device tokens (push) | FK → patients |
feedback_records |
Decision-outcome links for AI model improvement | FK → cases |
provider_facilities |
Hospital facilities for storefront display | FK → providers |
treatment_categories |
Medical treatment categories for storefront navigation | FK → tenants |
Core Tables Detail¶
tenants¶
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
slug VARCHAR(100) UNIQUE NOT NULL,
domain VARCHAR(255),
logo_url TEXT,
primary_color VARCHAR(7) DEFAULT '#008B8B',
accent_color VARCHAR(7) DEFAULT '#FF7F50',
flagsmith_environment_key TEXT,
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
patients¶
CREATE TABLE patients (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
clerk_user_id VARCHAR(255),
email VARCHAR(255) NOT NULL,
first_name VARCHAR(100),
last_name VARCHAR(100),
date_of_birth DATE,
gender VARCHAR(20),
country_of_residence VARCHAR(3), -- ISO 3166-1 alpha-3
preferred_language VARCHAR(10) DEFAULT 'en',
phone VARCHAR(20),
timezone VARCHAR(50),
medical_history JSONB DEFAULT '{}',
preferences JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(tenant_id, email)
);
providers¶
CREATE TABLE providers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name VARCHAR(255) NOT NULL,
slug VARCHAR(100) NOT NULL,
country VARCHAR(3) NOT NULL,
city VARCHAR(100) NOT NULL,
provider_type VARCHAR(50) DEFAULT 'hospital',
accreditations JSONB DEFAULT '[]',
specialties JSONB DEFAULT '[]',
language_services JSONB DEFAULT '{}',
facilities JSONB DEFAULT '{}',
pricing_tier VARCHAR(20),
website_url TEXT,
is_active BOOLEAN DEFAULT true,
neo4j_node_id VARCHAR(100),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(tenant_id, slug)
);
language_services JSONB Structure
doctors Table (Session 26)¶
The doctors table is the most detailed table in the schema with 36 columns across 8 seed rows. It was designed in Session 26 to support doctor-level matching.
CREATE TABLE doctors (
-- Identity
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
provider_id UUID NOT NULL REFERENCES providers(id),
external_id VARCHAR(100),
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
title VARCHAR(50), -- Dr., Prof., etc.
gender VARCHAR(20),
photo_url TEXT,
-- Professional
specialty VARCHAR(100) NOT NULL,
sub_specialties JSONB DEFAULT '[]',
medical_license_number VARCHAR(100),
license_country VARCHAR(3),
years_of_experience INTEGER,
education JSONB DEFAULT '[]', -- [{institution, degree, year}]
certifications JSONB DEFAULT '[]',
-- Languages
languages JSONB DEFAULT '[]', -- [{language, proficiency}]
primary_language VARCHAR(50),
-- Clinical Volume & Outcomes
annual_procedure_volume INTEGER,
total_procedures_performed INTEGER,
overall_success_rate DECIMAL(5,2),
complication_rate_30day DECIMAL(5,2),
patient_satisfaction_score DECIMAL(3,2),
mortality_rate DECIMAL(5,4),
-- Operational
consultation_fee_usd DECIMAL(10,2),
accepts_international_patients BOOLEAN DEFAULT true,
available_for_video_consult BOOLEAN DEFAULT true,
average_response_time_hours INTEGER,
next_available_date DATE,
-- Social Proof
publications_count INTEGER DEFAULT 0,
notable_achievements JSONB DEFAULT '[]',
patient_testimonials_count INTEGER DEFAULT 0,
-- Metadata
data_completeness JSONB DEFAULT '{}',
badges JSONB DEFAULT '[]',
is_active BOOLEAN DEFAULT true,
neo4j_node_id VARCHAR(100),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
Data Completeness Scoring¶
The data_completeness JSONB column stores a weighted scoring system with 7 categories:
| Category | Weight | Fields Evaluated |
|---|---|---|
| identity | 0.15 | first_name, last_name, title, photo_url, gender |
| languages | 0.10 | languages array, primary_language |
| credentials | 0.15 | education, certifications, license_number, years_of_experience |
| volume | 0.20 | annual_procedure_volume, total_procedures_performed |
| outcomes | 0.25 | success_rate, complication_rate, satisfaction_score, mortality_rate |
| operational | 0.10 | consultation_fee, video_consult, response_time, next_available |
| social_proof | 0.05 | publications_count, notable_achievements, testimonials_count |
{
"identity": {"score": 0.80, "missing": ["photo_url"]},
"languages": {"score": 1.00, "missing": []},
"credentials": {"score": 0.75, "missing": ["certifications"]},
"volume": {"score": 1.00, "missing": []},
"outcomes": {"score": 1.00, "missing": []},
"operational": {"score": 0.50, "missing": ["next_available_date", "average_response_time_hours"]},
"social_proof": {"score": 0.33, "missing": ["notable_achievements", "patient_testimonials_count"]},
"overall": 0.82
}
Badge System¶
Badges are assigned based on the overall data completeness score:
| Badge | Score Range | Display |
|---|---|---|
verified |
>= 0.80 | Green shield icon |
complete |
0.55 - 0.79 | Blue checkmark |
basic |
< 0.55 | Gray outline |
doctor_procedures Table¶
CREATE TABLE doctor_procedures (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
doctor_id UUID NOT NULL REFERENCES doctors(id),
procedure_code VARCHAR(20) NOT NULL, -- CPT code
procedure_name VARCHAR(255) NOT NULL,
annual_volume INTEGER,
success_rate DECIMAL(5,2),
complication_rate_30day DECIMAL(5,2),
average_los_days DECIMAL(4,1), -- Length of stay
proms_scores JSONB DEFAULT '{}', -- Patient-Reported Outcome Measures
technique VARCHAR(100), -- e.g., "robotic", "laparoscopic", "open"
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(tenant_id, doctor_id, procedure_code)
);
PROMS Scores
Patient-Reported Outcome Measures are stored as JSONB with procedure-specific instruments:
Neo4j Graph Model¶
Node Types¶
graph TD
Patient["Patient (dynamic)"]
Condition["Condition (12)"]
Procedure["Procedure (12)"]
Provider["Provider (42)"]
Doctor["Doctor (8)"]
DiagnosticTest["DiagnosticTest (25)"]
Accreditation["Accreditation (7)"]
RecoveryPhase["RecoveryPhase (7)"]
Location["Location (22)"]
Specialty["Specialty"]
Language["Language"]
Patient -->|HAS_CONDITION| Condition
Condition -->|REQUIRES| Procedure
Provider -->|OFFERS| Procedure
Procedure -->|REQUIRES_TEST| DiagnosticTest
Provider -->|HAS_ACCREDITATION| Accreditation
Procedure -->|HAS_RECOVERY| RecoveryPhase
Provider -->|LOCATED_IN| Location
Doctor -->|AFFILIATED_WITH| Provider
Doctor -->|PERFORMS| Procedure
Doctor -->|SPECIALIZES_IN| Specialty
Doctor -->|SPEAKS| Language
style Patient fill:#008B8B,color:#fff
style Provider fill:#FF7F50,color:#fff
style Doctor fill:#FF7F50,color:#fff
style Condition fill:#4A90D9,color:#fff
style Procedure fill:#4A90D9,color:#fff
Node Counts (POC Seed Data)¶
| Node Type | Count | Source |
|---|---|---|
| Patient | Dynamic | Created per patient journey |
| Condition | 12 | Seed data (hip replacement, cardiac, dental, etc.) |
| Procedure | 12 | Seed data (mapped to conditions) |
| Provider | 42 | Seed data (hospitals across 4 countries) |
| Doctor | 8 | Seed data (Session 26) |
| DiagnosticTest | 25 | Seed data (blood work, imaging, cardiac tests) |
| Accreditation | 7 | JCI, NABH, ISO, ACHSI, TEMOS, GCR, GHA |
| RecoveryPhase | 7 | Pre-op through long-term follow-up |
| Location | 22 | Cities across India, Turkey, Thailand, Spain |
| Specialty | Dynamic | Orthopedics, Cardiology, Dental, etc. |
| Language | Dynamic | English, Hindi, Turkish, Thai, Spanish, Arabic, etc. |
Relationship Properties¶
The OFFERS relationship carries rich metadata used in matching:
(provider:Provider)-[:OFFERS {
cost_usd: 12500,
cost_currency: "USD",
annual_volume: 450,
success_rate: 97.2,
average_los_days: 5,
wait_time_weeks: 2,
package_includes: ["surgery", "hospital_stay", "physiotherapy"],
last_updated: datetime()
}]->(procedure:Procedure)
The PERFORMS relationship connects doctors to procedures:
(doctor:Doctor)-[:PERFORMS {
annual_volume: 120,
success_rate: 98.1,
technique: "robotic-assisted",
since_year: 2018
}]->(procedure:Procedure)
Key Cypher Queries¶
Find providers for a condition:
MATCH (c:Condition {code: $condition_code})-[:REQUIRES]->(p:Procedure)
<-[:OFFERS]-(prov:Provider)-[:LOCATED_IN]->(loc:Location)
WHERE prov.tenant_id = $tenant_id AND prov.is_active = true
RETURN prov, p, loc
ORDER BY prov.offers_success_rate DESC
Find doctors with language concordance:
MATCH (d:Doctor)-[:AFFILIATED_WITH]->(prov:Provider),
(d)-[:PERFORMS]->(proc:Procedure {code: $procedure_code}),
(d)-[:SPEAKS]->(lang:Language {code: $patient_language})
WHERE prov.tenant_id = $tenant_id
RETURN d, prov, proc, lang
Qdrant Vector Collections¶
Collection Summary¶
| Collection | Vector Count | Dimensions | Model | Purpose |
|---|---|---|---|---|
providers |
42 | 1024 | Voyage AI voyage-3.5-lite | Semantic provider search |
conditions |
12 | 1024 | Voyage AI voyage-3.5-lite | Condition similarity matching |
requirement_embeddings |
70 | 1024 | Voyage AI voyage-3.5-lite | Document-to-requirement matching |
document_embeddings |
Dynamic | 1024 | Voyage AI voyage-3.5-lite | Uploaded document search |
Provider Vectors¶
Each provider vector encodes a rich text representation:
provider_text = f"""
{provider.name} is a {provider.provider_type} located in {provider.city}, {provider.country}.
Specialties: {', '.join(provider.specialties)}.
Accreditations: {', '.join(provider.accreditations)}.
Language services: {json.dumps(provider.language_services)}.
"""
embedding = voyage_client.embed(provider_text, model="voyage-3.5-lite")
Requirement Embeddings¶
The requirement_embeddings collection stores 70 vectors representing document requirements for various procedures. Each vector encodes the natural-language description of what the requirement is:
# Example: "Recent blood work within 30 days showing CBC, metabolic panel, and coagulation studies"
# This enables matching uploaded documents to requirements via cosine similarity
Vector Sync
When provider data changes in PostgreSQL, the corresponding Qdrant vector must be re-embedded and upserted. This is handled by the sync_provider_embeddings background task, triggered via QStash.
FHIR R4 Resources¶
Supported Resource Types¶
| FHIR Resource | PostgreSQL Storage | Source |
|---|---|---|
Patient |
fhir_resources (JSONB) |
Registration + intake |
Condition |
fhir_resources (JSONB) |
Clinical Context Agent extraction |
Procedure |
fhir_resources (JSONB) |
Mapped from conditions |
Observation |
fhir_resources (JSONB) |
Lab results, vitals from documents |
DocumentReference |
fhir_resources (JSONB) |
Uploaded document metadata |
Consent |
fhir_resources (JSONB) |
Patient consent records |
Storage Schema¶
CREATE TABLE fhir_resources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
patient_id UUID NOT NULL REFERENCES patients(id),
case_id UUID REFERENCES cases(id),
resource_type VARCHAR(50) NOT NULL,
resource_id VARCHAR(255) NOT NULL, -- FHIR resource ID
resource JSONB NOT NULL, -- Full FHIR R4 JSON
version INTEGER DEFAULT 1,
is_current BOOLEAN DEFAULT true,
source VARCHAR(50), -- 'agent', 'manual', 'import'
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(tenant_id, resource_type, resource_id, version)
);
CREATE INDEX idx_fhir_patient ON fhir_resources(tenant_id, patient_id, resource_type);
CREATE INDEX idx_fhir_resource ON fhir_resources USING GIN(resource);
Validation¶
All FHIR resources are validated at write time using the fhir.resources Python library:
from fhir.resources.condition import Condition
def validate_fhir_condition(resource_json: dict) -> Condition:
"""Validate and return a typed FHIR Condition resource."""
return Condition.model_validate(resource_json)
Entity Relationship Overview¶
erDiagram
TENANTS ||--o{ PATIENTS : contains
TENANTS ||--o{ PROVIDERS : contains
TENANTS ||--o{ DOCTORS : contains
PATIENTS ||--o{ CASES : creates
CASES ||--o{ CONVERSATIONS : has
CONVERSATIONS ||--o{ MESSAGES : contains
CASES ||--o{ DOCUMENTS : includes
CASES ||--o{ MATCH_RESULTS : produces
CASES ||--o{ FHIR_RESOURCES : generates
PATIENTS ||--o{ CONSENT : grants
PROVIDERS ||--o{ DOCTORS : employs
DOCTORS ||--o{ DOCTOR_PROCEDURES : performs
PROVIDERS ||--o{ PROVIDER_PROCEDURES : offers
CASES ||--o{ CONSULTATIONS : schedules
PATIENTS ||--o{ NOTIFICATIONS : receives
TENANTS ||--o{ EVENTS : logs
TENANTS ||--o{ AUDIT_LOGS : tracks
Data Flow Summary¶
sequenceDiagram
participant Patient
participant API as FastAPI
participant PG as PostgreSQL
participant Neo as Neo4j
participant QD as Qdrant
participant R2 as Cloudflare R2
Patient->>API: Upload document
API->>R2: Store file (presigned URL)
API->>PG: Create document record
API->>PG: Store FHIR resources (JSONB)
API->>Neo: Create/update Patient node
API->>Neo: Link Patient → Condition → Procedure
API->>QD: Embed document → requirement matching
API->>PG: Store match results
API->>PG: Log event + audit trail