03 — Data Architecture¶
FHIR R4 Compliance¶
Medical data stored as FHIR R4 resources using fhir.resources (Pydantic-based), stored as JSONB in Railway PostgreSQL. Every resource validated against HL7 R4 schema before storage.
FHIR Resource Types (MVP)¶
| Resource | Purpose | Example for Aisha |
|---|---|---|
Condition |
Diagnoses | M17.11 osteoarthritis right knee, E11 Type 2 diabetes |
Procedure |
Indicated treatments | 27447 Total Knee Arthroplasty |
Observation |
Lab values, vitals, functional status | HbA1c 7.2%, BP 138/85, LOINC-coded |
AllergyIntolerance |
Drug, food, material allergies | Latex allergy (severity: moderate) |
MedicationStatement |
Current medications | Metformin 1000mg BID, Losartan 50mg QD |
DocumentReference |
Uploaded files | R2 storage keys for X-ray report, blood panel |
FHIR Storage Schema¶
CREATE TABLE fhir_resources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
patient_id UUID NOT NULL REFERENCES patients(id),
tenant_id VARCHAR(100) NOT NULL,
resource_type VARCHAR(50) NOT NULL, -- 'Condition', 'Procedure', etc.
resource_data JSONB NOT NULL, -- Full FHIR R4 JSON
source VARCHAR(50), -- 'clinical_context_agent', 'intake_chat', 'manual'
confidence FLOAT, -- 0.0–1.0 for AI-extracted resources
version INTEGER DEFAULT 1,
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_fhir_patient ON fhir_resources(patient_id, resource_type);
CREATE INDEX idx_fhir_tenant ON fhir_resources(tenant_id);
Clinical Knowledge Graph (Neo4j)¶
Node Types¶
| Node | Properties | Example |
|---|---|---|
Patient |
patient_id, tenant_id, demographics_hash | Aisha |
Condition |
icd10_code, snomed_code, name, severity | M17.11, Kellgren-Lawrence Grade 4 |
Procedure |
cpt_code, name, complexity_tier | 27447, TKR, routine |
Provider |
provider_id, name, country, city, tier | Apollo Chennai, India |
Doctor |
doctor_id, name, specialty, volume | Dr. Venkatesh, Orthopedics, 500+/yr |
Outcome |
success_rate, complication_rate, satisfaction | 95%, 2.1%, 4.8/5 |
DiagnosticTest |
test_name, loinc_code, validity_days | HbA1c, 90 days |
Accreditation |
body, level | JCI, NABH, ISO 15189 |
Specialty |
name, subspecialties | Orthopedics, Joint Replacement |
Location |
country, city, timezone | India, Chennai, Asia/Kolkata |
Cost |
procedure_cost, currency, package_details | $6,500, USD, all-inclusive |
Key Relationships¶
(Patient)-[:DIAGNOSED_WITH {date, source}]->(Condition)
(Condition)-[:INDICATED_FOR]->(Procedure)
(Provider)-[:PERFORMS {volume, success_rate, cost}]->(Procedure)
(Provider)-[:HAS_DOCTOR]->(Doctor)
(Doctor)-[:SPECIALIZES_IN]->(Specialty)
(Provider)-[:LOCATED_IN]->(Location)
(Provider)-[:ACCREDITED_BY]->(Accreditation)
(Procedure)-[:REQUIRES_TEST {mandatory, validity_days, source_acceptance, on_site_required}]->(DiagnosticTest)
(Provider)-[:HAS_OUTCOME {period, sample_size}]->(Outcome)
Provider-Specific Overrides¶
DiagnosticTest requirements can be overridden per provider: - Bumrungrad: accepts external MRI within 60 days (vs default 90) - Apollo Chennai: requires on-site cardiac clearance for diabetes patients - Acibadem: requires specific imaging protocol for TKR planning
Vector Store (Qdrant)¶
- Embeddings: Voyage AI (primary), OpenAI (configurable fallback)
- Role: Always-on Stage 0 in matching pipeline — semantic discovery before graph filtering
- Storage: Provider clinical profiles, procedure descriptions, patient case summaries
- Cross-lingual: Embeddings from English text, multilingual query via Voyage multilingual model
Qdrant Collection Schema¶
{
"collection": "provider_profiles",
"vector_size": 1024,
"distance": "Cosine",
"payload_fields": {
"provider_id": "keyword",
"tenant_id": "keyword",
"specialties": "keyword[]",
"country": "keyword",
"procedures": "keyword[]",
"text_content": "text"
}
}
Multilingual Strategy¶
- Storage: Canonical English with ICD/SNOMED codes as universal identifiers
- Translation: Edge-layer via LLM (patient-facing responses only)
- PostgreSQL:
name_translationsJSONB per locale on relevant tables - Neo4j:
display_namesmaps on nodes - Qdrant: Embeddings from English, cross-lingual queries via multilingual model
Multicurrency¶
- All monetary values: smallest currency unit (USD cents) + ISO 4217 code
- Display conversion: Frankfurter API (free, daily refresh via QStash cron)
- No floating-point currency math
Timezone Handling¶
- All timestamps: UTC in database, no exceptions
- User timezone: stored on profile (e.g.,
"Asia/Dubai") - Conversion: API response layer adds both UTC and localized time
- Critical for cross-border consultations spanning timezones
Clinical Ontology Layer¶
Coding Systems¶
| System | Purpose | MVP Status |
|---|---|---|
| ICD-10 | Diagnosis classification | Active — on FHIR Conditions |
| SNOMED CT | Clinical terminology (more granular) | Active — alongside ICD |
| CPT | Procedure coding (US billing) | Stubbed — on provider procedure_costs |
| LOINC | Lab observations | Active — in FHIR Observations |
| ATC | Medication classification | Deferred to post-MVP |
MVP Approach¶
Clinical Context Agent uses Claude Haiku to map entities to codes via few-shot prompting. Flexible (handles free-text, abbreviations, multilingual), not deterministic.
Post-MVP Enhancement¶
Dedicated Clinical Ontology Service: UMLS Metathesaurus validation + PyMedTermino + ICD-10 hierarchy traversal + cross-system mapping. Hybrid: LLM proposes, ontology confirms.