Skip to content

03 — Data Architecture

FHIR R4 Compliance

Medical data stored as FHIR R4 resources using fhir.resources (Pydantic-based), stored as JSONB in Railway PostgreSQL. Every resource validated against HL7 R4 schema before storage.

FHIR Resource Types (MVP)

Resource Purpose Example for Aisha
Condition Diagnoses M17.11 osteoarthritis right knee, E11 Type 2 diabetes
Procedure Indicated treatments 27447 Total Knee Arthroplasty
Observation Lab values, vitals, functional status HbA1c 7.2%, BP 138/85, LOINC-coded
AllergyIntolerance Drug, food, material allergies Latex allergy (severity: moderate)
MedicationStatement Current medications Metformin 1000mg BID, Losartan 50mg QD
DocumentReference Uploaded files R2 storage keys for X-ray report, blood panel

FHIR Storage Schema

CREATE TABLE fhir_resources (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    patient_id UUID NOT NULL REFERENCES patients(id),
    tenant_id VARCHAR(100) NOT NULL,
    resource_type VARCHAR(50) NOT NULL,        -- 'Condition', 'Procedure', etc.
    resource_data JSONB NOT NULL,              -- Full FHIR R4 JSON
    source VARCHAR(50),                         -- 'clinical_context_agent', 'intake_chat', 'manual'
    confidence FLOAT,                           -- 0.0–1.0 for AI-extracted resources
    version INTEGER DEFAULT 1,
    is_active BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_fhir_patient ON fhir_resources(patient_id, resource_type);
CREATE INDEX idx_fhir_tenant ON fhir_resources(tenant_id);

Clinical Knowledge Graph (Neo4j)

Node Types

Node Properties Example
Patient patient_id, tenant_id, demographics_hash Aisha
Condition icd10_code, snomed_code, name, severity M17.11, Kellgren-Lawrence Grade 4
Procedure cpt_code, name, complexity_tier 27447, TKR, routine
Provider provider_id, name, country, city, tier Apollo Chennai, India
Doctor doctor_id, name, specialty, volume Dr. Venkatesh, Orthopedics, 500+/yr
Outcome success_rate, complication_rate, satisfaction 95%, 2.1%, 4.8/5
DiagnosticTest test_name, loinc_code, validity_days HbA1c, 90 days
Accreditation body, level JCI, NABH, ISO 15189
Specialty name, subspecialties Orthopedics, Joint Replacement
Location country, city, timezone India, Chennai, Asia/Kolkata
Cost procedure_cost, currency, package_details $6,500, USD, all-inclusive

Key Relationships

(Patient)-[:DIAGNOSED_WITH {date, source}]->(Condition)
(Condition)-[:INDICATED_FOR]->(Procedure)
(Provider)-[:PERFORMS {volume, success_rate, cost}]->(Procedure)
(Provider)-[:HAS_DOCTOR]->(Doctor)
(Doctor)-[:SPECIALIZES_IN]->(Specialty)
(Provider)-[:LOCATED_IN]->(Location)
(Provider)-[:ACCREDITED_BY]->(Accreditation)
(Procedure)-[:REQUIRES_TEST {mandatory, validity_days, source_acceptance, on_site_required}]->(DiagnosticTest)
(Provider)-[:HAS_OUTCOME {period, sample_size}]->(Outcome)

Provider-Specific Overrides

DiagnosticTest requirements can be overridden per provider: - Bumrungrad: accepts external MRI within 60 days (vs default 90) - Apollo Chennai: requires on-site cardiac clearance for diabetes patients - Acibadem: requires specific imaging protocol for TKR planning

Vector Store (Qdrant)

  • Embeddings: Voyage AI (primary), OpenAI (configurable fallback)
  • Role: Always-on Stage 0 in matching pipeline — semantic discovery before graph filtering
  • Storage: Provider clinical profiles, procedure descriptions, patient case summaries
  • Cross-lingual: Embeddings from English text, multilingual query via Voyage multilingual model

Qdrant Collection Schema

{
  "collection": "provider_profiles",
  "vector_size": 1024,
  "distance": "Cosine",
  "payload_fields": {
    "provider_id": "keyword",
    "tenant_id": "keyword",
    "specialties": "keyword[]",
    "country": "keyword",
    "procedures": "keyword[]",
    "text_content": "text"
  }
}

Multilingual Strategy

  • Storage: Canonical English with ICD/SNOMED codes as universal identifiers
  • Translation: Edge-layer via LLM (patient-facing responses only)
  • PostgreSQL: name_translations JSONB per locale on relevant tables
  • Neo4j: display_names maps on nodes
  • Qdrant: Embeddings from English, cross-lingual queries via multilingual model

Multicurrency

  • All monetary values: smallest currency unit (USD cents) + ISO 4217 code
  • Display conversion: Frankfurter API (free, daily refresh via QStash cron)
  • No floating-point currency math

Timezone Handling

  • All timestamps: UTC in database, no exceptions
  • User timezone: stored on profile (e.g., "Asia/Dubai")
  • Conversion: API response layer adds both UTC and localized time
  • Critical for cross-border consultations spanning timezones

Clinical Ontology Layer

Coding Systems

System Purpose MVP Status
ICD-10 Diagnosis classification Active — on FHIR Conditions
SNOMED CT Clinical terminology (more granular) Active — alongside ICD
CPT Procedure coding (US billing) Stubbed — on provider procedure_costs
LOINC Lab observations Active — in FHIR Observations
ATC Medication classification Deferred to post-MVP

MVP Approach

Clinical Context Agent uses Claude Haiku to map entities to codes via few-shot prompting. Flexible (handles free-text, abbreviations, multilingual), not deterministic.

Post-MVP Enhancement

Dedicated Clinical Ontology Service: UMLS Metathesaurus validation + PyMedTermino + ICD-10 hierarchy traversal + cross-system mapping. Hybrid: LLM proposes, ontology confirms.