Skip to content

ADR-0018: Multi-Tenancy Platform Architecture

Status: Accepted — Phased Implementation Date: 2026-04-10 (amended 2026-04-14: post-op recovery + facility capabilities) Session: 35 (amended: 40)

Context

Curaway is evolving from a single-tenant patient-facing AI into a multi-tenant platform serving 7 actor types across the medical travel coordination lifecycle. The current tenant_id column exists on every table but there's only one tenant in practice, no RBAC, no cross-tenant data sharing, no workflow engine beyond the case orchestrator's if/else state machine, and no provider-side UI.

The platform must support: patients, facilitators (external agents acting on behalf of patients), coordinators (Curaway staff), MSO doctors (second opinion providers), hospital/clinic providers (each in their own tenant), platform administrators, and super administrators.

Decision

Tenant Model

Tenant Count Isolation RBAC Managed by
Customer (patients) 1 shared Row-level by patient_id No self-service. Platform/super admin manages patient config (A/B tests, groups). Platform admin, Super admin
Facilitator 1 shared Row-level by facilitator_id + patient delegation Yes — roles, commission, attribution, patient ratings Facilitator admin + Platform admin
Coordinator 1 shared Role-based, assigned patients Yes — roles, CSAT scores, incentives, performance dashboards Curaway ops + Platform admin
MSO Doctor 1 shared Row-level by doctor_id + consultation reference Yes — credentials, specialty, consultation requests Platform admin
Provider N (one each) Full tenant isolation Yes — staff, departments, roles, own workflows, own billing Provider admin + Platform admin
Platform Admin 1 Config, tenant lifecycle Role-based Super admin
Super Admin 1 Full access, all tenants

Cross-Tenant Data Sharing

Actor Sees patient data via Data direction Redaction
Provider Copy (snapshot at forwarding time) One-way push Pseudonymized name, no email/phone/address/passport, age only (no DOB), price range only (no budget), no conversation history, no other provider quotes
MSO Doctor Reference (live clinical data) Read-only Clinical only — no PII, no financial, consultation-scoped
Coordinator Reference (live, broad access) Read-write (assigned patients) Full access — platform agreement covers it
Facilitator Reference (live, patient-scoped) Read-write (delegated patients) Consent-gated — patient grants/revokes per facilitator

Patient Identity to Provider

Never via platform. Providers receive pseudonymized case records (e.g., "Patient CRW-2026-00034"). Coordinator mediates all communication. Provider learns real identity only when patient physically arrives at hospital. This protects patient privacy and prevents providers from contacting patients directly outside the platform.

Facilitators vs Coordinators

Separate tenants, same RBAC engine. Facilitators are external (independent agents, commission-based, patient-rated). Coordinators are Curaway employees (salary + CSAT-based incentives, performance dashboards). Both have shared ratings and incentives tables with actor_type discriminator.

Operational state (2026-05-13 — §C 100% shipped): Facilitator referral attribution + commission accrual + per-case consent grant flow are live in main. Phase 1 of the go-live plan (PR #651) shipped via:

  • PR #189 (FE consent card) + PR #654 (BE emit on providers_selected) — patient sees a "Share your case with X?" card after match
  • PR #664 (E2E pytest integration, 15/15 covering all 6 steps + 4 failure-case rehearsals)
  • PR #670 (deactivation hardening — closes the P1 case_share leak surfaced by Phase 5.2 audit)
  • PR #669 (ADR-0024 A1 silent attribution drop on inactive facilitator)
  • PR #667 (case_shared field on /commissions API) + #192 (FE attribution-only badge per DP-4)

Live D5 flow steps and reachability: 1. Facilitator generates link via /referrals — code wired, FF off (facilitator_referral_links_enabled) 2. /api/v1/public/r/{slug} redirect with HMAC cookie — wired, gated on REFERRAL_COOKIE_SECRET env var (#627) 3. Patient signup posts referred_by_facilitator_id — wired, reachable 4. Case.referred_by_facilitator_id propagation — wired, reachable 5. Patient grants facilitator consent via card — wired, reachable 6. Commission accrual at payment_locked — wired, FF off (facilitator_commissions_enabled)

Outstanding gates: deploy REFERRAL_COOKIE_SECRET (#627) → flip the 4 FFs per target tenant in sequence (Phase 3 of plan #651). DPIA + LIA both filed (#655, #665).

Companion ADR: ADR-0024 (facilitator lifecycle policy) — locks Option A (one-way soft delete) and A1 (silent attribution drop on inactive facilitator).

Workflow Engine

Hybrid approach: 1. Python state machine library (python-statemachine) for synchronous case lifecycle transitions — replaces 24K-line if/else in case_orchestrator.py 2. Upstash Workflow for async multi-step pipelines — document processing, quote normalization, SLA timer callbacks, risk assessment workflow 3. Case state persisted in PostgreSQL workflow_state JSONB (same as today, but transitions validated by state machine)

Case Lifecycle State Machine

Patient flow:
  intake → procedure_identified → records_collected → intake_complete
  → matching → providers_selected → consent_given

Platform flow:
  → risk_review_pending → risk_cleared
  → providers_notified → quoting

Provider flow (per provider tenant):
  received → reviewing → info_requested → quoted (or rejected)

Patient decision:
  → quotes_pooled → patient_reviewing → provider_selected
  → mso_offered → mso_complete (or mso_skipped)
  → payment_locked

Coordination flow:
  → coordinator_assigned → pre_op → travel_booked → admitted
  → procedure_complete → post_op → follow_up → case_complete

New Agentic Workflows (6)

  1. Risk Assessment Workflow — AI risk scoring + confidence + scoring vector → human review queue → approve/reject/request more info
  2. Quote Intake + Normalization Agent — parses provider quote submissions, normalizes to structured schema, converts currency, manages pool threshold
  3. MSO Doctor Matching Agent — reuses matching_engine with exclusion strategy (exclude doctors from selected provider). MSO includes teleconsultation (see below).
  4. Provider Communication Agent — AI drafts patient-facing info requests, routes through coordinator/facilitator
  5. Case Redaction Agent — applies redaction policy at forwarding time, creates pseudonymized snapshot
  6. Quote Comparison Agent — generates structured side-by-side comparison (digital version of Mediglobe comparative study PDF)

MSO Teleconsultation

MSO second opinions include teleconsultation as an essential part of the process. Two formats offered to the patient as a choice:

  1. Document review + chat (standard) — MSO doctor reviews case records, writes consultation document, communicates async via platform messaging. Lower cost.
  2. Video teleconsultation (premium) — live embedded video call between patient and MSO doctor. Higher cost. Requires scheduling.

Key decisions: - Embedded video — first preference is in-platform (lightweight integration: Daily.co or 100ms). External link (Zoom/Meet) as fallback if embedded is infeasible for MVP. - Scheduling — platform provides scheduling via coordinator or patient self-service (calendar availability, time zone handling, reminders). - No recording for now — avoids consent-to-record compliance complexity. Teleconsultation outcome is captured as a structured consultation document by the MSO doctor. - Pricing — video teleconsultation priced separately from document review. Configurable per MSO doctor or platform-standard. - Patient-side soft offer — surfaces during intake_triage via app/agents/mso_offer.py::_maybe_emit_mso_offer; emits an offer_mso_at_intake state-machine transition. Lifecycle owned by app/repositories/consultation_repository.py::create_proposed / record_reoffer / mark_declined (proposed → declined / scheduled / proposed-reoffer). See docs/specs/mso-patient-agent-integration-feature.md + closeout follow-up PR fix/mso-offer-closeout-review.

Teleconsultation data model:

teleconsultation_sessions (
  id, case_id, mso_consultation_id, doctor_id, patient_id,
  format: ENUM(video, chat),
  scheduled_at, started_at, ended_at, duration_minutes,
  status: scheduled | in_progress | completed | cancelled | no_show,
  video_provider: ENUM(daily, 100ms, external_link),
  video_room_id,
  cost_cents, cost_currency,
  created_at
)

Facility Capability Disclosure & Readiness Scoring

Status (2026-05-13): DESIGNED, ZERO IMPLEMENTATION — none of the tables (provider_capabilities, procedure_capability_requirements), Neo4j edges (HAS_CAPABILITY, REQUIRES_CAPABILITY), readiness scoring logic in the matching engine, or patient-facing 3-section panel exist. Facility readiness scoring + capability seeding deferred from Phase 1; PR #857 (procedure onboarding admin UI) is the Phase B unblocker for admin-side capability seeding. Phase B specs for provider onboarding UI, matching config UI, and Flagsmith UI are still pending.

Context: When provider matches are shown, patients see required tests (from Neo4j REQUIRES_TEST) but don't know whether the matched hospital can actually perform those tests on-site, or whether external arrangements are needed. For medical tourists traveling internationally, this information is critical — e.g., blood bank availability in India (where donor replacement is common), specialized imaging equipment, ICU capacity, or even visa letter support.

Scope: Full Facility Capability Disclosure

Three tiers of capability, all surfaced to the patient:

  1. Diagnostic capabilities — which required tests can be done on-site (MRI, blood work, HbA1c, PET scan, etc.)
  2. Operational support — blood bank, ICU beds, specialized anesthesiology, post-op monitoring, pharmacy, physiotherapy
  3. Logistical readiness — airport transfer, visa invitation letter, accommodation assistance, language support staff, dietary accommodation, international insurance acceptance

Data Model

provider_capabilities (new)
  id, provider_id, tenant_id,
  category: ENUM(diagnostic, operational, logistical),
  capability_code: VARCHAR(50),       -- e.g., "blood_bank", "mri", "icu", "airport_transfer"
  capability_name: VARCHAR(100),
  status: ENUM(available, limited, unavailable, external_arrangement),
  details: JSONB,                     -- capacity, conditions, notes
  -- e.g., {"capacity": "24/7", "note": "Donor replacement required for transfusions"}
  -- e.g., {"partner": "City Scan Center", "distance_km": 2.5, "transport_provided": true}
  verified_at: TIMESTAMP,
  verified_by: ENUM(provider_self_reported, curaway_verified, patient_reported),
  created_at, updated_at

procedure_capability_requirements (new)
  id, procedure_code, tenant_id,
  capability_code: VARCHAR(50),       -- links to provider_capabilities.capability_code
  criticality: ENUM(critical, recommended, nice_to_have),
  -- critical: gap lowers match score + patient warning
  -- recommended: gap noted in details, minor score penalty
  -- nice_to_have: informational only
  condition_note: TEXT,               -- e.g., "Required if patient is on anticoagulants"
  created_at

Neo4j additions:

(:Provider)-[:HAS_CAPABILITY {status, verified_at}]->(:Capability {code, name, category})
(:Procedure)-[:REQUIRES_CAPABILITY {criticality, condition_note}]->(:Capability)

This extends the existing REQUIRES_TEST pattern. REQUIRES_TEST covers diagnostic tests with validity windows and source acceptance. REQUIRES_CAPABILITY covers the broader operational and logistical needs.

Readiness Score

At match time, each provider gets a Facility Readiness Score computed from the gap analysis:

readiness_score = weighted average of:
  - diagnostic_coverage:  (available diagnostics / required diagnostics)  × 0.40
  - operational_coverage: (available operational / required operational)  × 0.35
  - logistical_coverage:  (available logistical / required logistical)   × 0.25

Critical capability gaps (e.g., no blood bank for a procedure that may require transfusion): - Score penalty: -0.15 per critical gap on the provider's overall match score - Explicit patient-facing note in match results: "This facility does not have an on-site blood bank. In India, donor replacement is typically required for transfusions. Your coordinator will help arrange this." - Flagged on coordinator dashboard if patient selects this provider

Match integration: Readiness score feeds into the existing matching engine as a modifier on the clinical_relevance weight, not a separate weight. Keeps the 7-weight model clean. A facility with all capabilities gets no penalty; a facility missing critical capabilities gets a visible score reduction.

Patient-Facing Display

Headline: "Facility Readiness: 92%" (or color-coded: green >= 85%, amber 60-84%, red < 60%)

Expandable detail panel (3 sections):

Section Shows
Diagnostic "Available on-site: MRI, blood work, ECG. External: PET scan (partner lab 2km, transport provided)"
Operational "Blood bank: limited (donor replacement required). ICU: 12 beds, 24/7. Pharmacy: on-site."
Logistical "Airport transfer: included. Visa letter: provided. Language: English, Hindi, Arabic staff."

Items with status: external_arrangement show the partner facility name, distance, and whether transport is provided. Items with status: unavailable show as a gap with the coordinator note.

Capability Verification

Capabilities are initially provider self-reported during onboarding. Over time: - Curaway team verifies during provider visits → curaway_verified - Patient feedback can flag inaccuracies → patient_reported (triggers review) - verified_at timestamp enables staleness detection (re-verify annually)

Seed Data Approach

For MVP, seed capability data for the existing 42 providers based on: - Hospital accreditation level (JCI hospitals typically have full diagnostic + operational capabilities) - Country-specific patterns (Indian hospitals: flag blood bank donor policy; Thai hospitals: strong logistical readiness; Turkish hospitals: strong diagnostic but variable language support) - Procedure-capability requirements seeded for 12 existing procedures

Post-Op Recovery Matching & Tracking

Status (2026-05-13): 75% — BE 100% complete: tables (recovery_provider_profiles, hospital_recovery_partnerships, recovery_milestones, recovery_outcomes), Neo4j entities (RecoveryProvider, PARTNERS_WITH, RECOVERY_NEEDS), recovery_matcher.py, and recovery sub-flow in orchestrator all shipped via PRs #821-828 and #832 (merged 2026-05-12). FE recovery cards merged (#264). Deferred: Phase 2 facility check-in API (POST /api/v1/recovery/{case_id}/facility-update) + recovery_outcome_scoring_enabled flag activation (awaiting min-5-stay data).

Context: After surgery, patients recovering abroad need support — rehabilitation (physiotherapy, monitoring) and/or recovery accommodation (medical-grade hotels with nursing). This is an optional add-on service the patient can choose. It uses a lightweight matching process, not the full surgical matching engine.

Recovery Provider Model

Three-tier commercial model, phased by contract availability:

  1. Curaway-partnered — pre-negotiated deals with 1-2 recovery facilities per surgical provider city. Platform-curated quality. Launch tier.
  2. Hospital-recommended — surgical provider suggests their own recovery partners. Hospital controls the relationship. Displayed with "Recommended by {hospital}" trust badge.
  3. Open marketplace — future. Any recovery facility in the city can list. Patient browses all options.

Recovery providers are a provider_type on the existing providers table (not a separate entity). This inherits tenant isolation, storefront fields, and CRUD APIs.

Data model:

providers (existing — extended)
  provider_type: ENUM(surgical, recovery_rehab, recovery_accommodation, recovery_both)

recovery_provider_profiles (new)
  id, provider_id, tenant_id,
  facility_type: ENUM(rehab_center, recovery_hotel, hybrid),
  latitude, longitude,
  daily_rate_cents, daily_rate_currency,
  accommodation_tier: ENUM(basic, comfort, premium),
  dietary_options: JSONB,           -- ["halal", "kosher", "vegetarian", ...]
  staff_languages: JSONB,           -- ["en", "ar", "tr", ...]
  capabilities: JSONB,              -- ["physiotherapy", "cardiac_monitoring", "wound_care", ...]
  max_stay_days: INT,
  nursing_ratio: VARCHAR,           -- e.g., "1:4"
  transport_to_hospital: BOOL,
  created_at, updated_at

hospital_recovery_partnerships (new)
  id, surgical_provider_id, recovery_provider_id, tenant_id,
  partnership_type: ENUM(hospital_recommended, curaway_partnered, marketplace),
  status: ENUM(active, suspended, expired),
  commission_pct: DECIMAL,
  notes: TEXT,
  created_at, updated_at

Neo4j additions:

(:RecoveryProvider {id, name, facility_type, lat, lon, city})
(:Provider)-[:PARTNERS_WITH {type, status}]->(:RecoveryProvider)
(:Procedure)-[:RECOVERY_NEEDS {capabilities: [...], typical_days}]->(:RecoveryPhase)

RECOVERY_NEEDS extends the existing HAS_RECOVERY relationship with capability requirements per procedure.

Recovery Matching — Lightweight Scorer

Single function in recovery_matcher.py. Input: surgical provider ID + patient preferences + procedure code. Output: ranked list of 1-5 recovery facilities.

Factor Weight (MVP) Weight (with outcomes) How calculated
Proximity 0.25 0.20 Haversine distance from surgical hospital. <2km=1.0, <5km=0.8, <10km=0.5, >10km=0.2
Cost fit 0.20 0.15 Patient budget tier vs. accommodation tier. Exact=1.0, one off=0.6, two=0.2
Facility type match 0.20 0.15 Procedure's RECOVERY_NEEDS capabilities vs. facility capabilities (Jaccard overlap)
Language 0.15 0.12 Patient preferred language in staff_languages=1.0, English fallback=0.5, neither=0.0
Patient preferences 0.10 0.08 Dietary + accommodation standard + amenity checklist overlap
Procedure capability 0.10 0.10 Hard requirements for the procedure. Missing=0.0 (disqualifying). Present=1.0
Outcome score 0.00 0.20 Activated when facility has >=5 completed recovery stays

Hospital affiliation is a trust badge, not a scoring weight. Hospital-recommended facilities display "Recommended by {hospital name}" and sort to top of equal-score ties.

Disqualification rules (hard filters before scoring): - Missing a required capability for the procedure - Facility suspended or expired - Facility max_stay_days < procedure's typical recovery days

Weights configurable via Flagsmith (recovery_matching_weights_v1) with hardcoded fallback.

Case Lifecycle Integration — Progressive Surfacing

Recovery is a sub-flow in workflow_state.recovery, not new top-level case states. The main case flow is unaffected if the patient declines.

Touch 1 — Exploratory (at providers_selected): Agent mentions recovery options. If the patient expresses interest, recovery matching runs immediately. Results shown as a lightweight card (facility name, type, distance, daily rate, hospital badge). Patient can "Save for later" — shortlisted IDs saved to workflow_state.recovery.shortlisted_ids. No commitment required.

Touch 2 — Decision (at consent_given or payment_locked): After commitment, recovery options resurface with pre-loaded shortlist. Patient confirms booking.

Recovery sub-flow states (in workflow_state.recovery):
  recovery_offered → recovery_opted_in → recovery_matching
  → recovery_selected → recovery_booked → recovery_active
  → recovery_complete
  (or recovery_offered → recovery_declined)
// workflow_state example
{
  "intake_complete": true,
  "matching_complete": true,
  "recovery": {
    "status": "recovery_matching",
    "opted_in_at": "2026-04-14T...",
    "surgical_provider_id": "...",
    "shortlisted_ids": ["...", "..."],
    "facility_type_preference": "both",
    "budget_tier": "comfort"
  }
}

Recovery Tracking — Milestone Check-ins

MVP (Phase 1): AI + coordinator milestone check-ins.

Milestones auto-generated from procedure's RecoveryPhase nodes + actual surgery date. Patient checks in via chat at each milestone.

recovery_milestones (new)
  id, case_id, recovery_provider_id, tenant_id,
  phase_code,                          -- links to RecoveryPhase node
  phase_name,                          -- "Day 1 Post-Op", "Day 3", "Pre-Flight Clearance"
  expected_date: DATE,
  status: ENUM(upcoming, due, completed, overdue, skipped),
  checked_in_at: TIMESTAMP,
  checked_in_by: ENUM(patient, coordinator, facility),
  pain_level: INT,                     -- 1-10, optional
  mobility_note: TEXT,
  concerns: TEXT,
  photo_document_id: VARCHAR,          -- optional progress photo in R2
  escalated: BOOL DEFAULT FALSE,
  escalation_reason: TEXT,
  created_at, updated_at

Check-in flow: 1. Milestones auto-generated when recovery_active is set 2. Notification sent at each milestone date (in-app + email) 3. Patient responds via chat — recovery_checkin phase context extracts pain level, concerns 4. Pre-flight clearance is the final gate — coordinator confirms safe to travel home

Escalation rules (configurable in guardrails.yaml): - Pain level >= 7 → auto-escalate to coordinator - Missed check-in > 24h → coordinator notification - Patient mentions keywords (bleeding, fever, swelling, emergency) → immediate escalation - Any milestone overdue > 48h → flagged on coordinator dashboard

Phase 2 (future): Facility-reported structured updates.

Recovery facilities with API access submit updates via POST /api/v1/recovery/{case_id}/facility-update. Feeds recovery_outcomes directly. Patient check-ins remain as the patient-side complement.

Recovery Outcome Data — Reserved for Data Moat

Table exists from day one (empty at MVP). Outcome scoring weight reserved at 0.0, activated via recovery_outcome_scoring_enabled Flagsmith flag when data arrives.

recovery_outcomes (new — empty at MVP)
  id, recovery_provider_id, case_id, procedure_code, tenant_id,
  milestones_on_track: BOOL,
  actual_stay_days: INT,
  early_discharge: BOOL,
  readmission: BOOL,
  complication_during_recovery: BOOL,
  patient_rating: DECIMAL(2,1),      -- 1.0-5.0
  would_recommend: BOOL,
  mobility_score_discharge: DECIMAL, -- procedure-specific
  pain_level_discharge: INT,         -- 1-10
  reported_by: ENUM(patient, coordinator, facility),
  created_at, updated_at

Outcome score formula (when activated, min 5 completed stays):

outcome_score = (
  0.35 * avg_patient_rating / 5.0
  + 0.25 * (1 - readmission_rate)
  + 0.20 * milestones_on_track_rate
  + 0.20 * (1 - complication_rate)
)

Below minimum sample size, outcome weight redistributes to other factors (same weight_redistribution.py pattern as surgical matching).

Additional Decisions

  • Case number: Curaway case number (CRW-2026-XXXXX) travels with the case. Provider-assigned references are metadata attached via case_linkages table. Once patient is admitted, hospital HMS ID mapped to the case for processing updates.
  • Quote validity: 30 days, platform-configurable + grace period configurable.
  • Post-procedure outcomes: Platform tracks surgery success, complications, patient satisfaction for provider quality scoring — the most important piece of the data moat. Recovery outcomes feed the same flywheel.
  • Provider onboarding: Storefront data set sufficient for now. Rest added later. Platform-assisted with AI tools. Self-service later.
  • Recovery provider onboarding: Same onboarding flow as surgical providers but with recovery_provider_profiles additional fields. Hospital-recommended facilities can be onboarded via the surgical provider's portal.
  • Facilitator/MSO onboarding: Vetting process with credential verification.
  • SLA: Per-provider (negotiated at onboarding), platform-standard target enforced through contract.

Consequences

Positive: - Each actor type has appropriate access boundaries - Patient identity protected until physical arrival - Provider gets stable snapshot for clinical decision-making - State machine makes case lifecycle explicit, validated, and extensible - Upstash Workflow adds durability to async pipelines - Foundation for payments, SLA monitoring, and quality scoring

Negative: - Significant architecture change — 14-18 weeks for full platform - case_orchestrator refactor is high-risk (most coupled file, 24K lines) - Cross-tenant data sharing adds complexity to every query - Multiple dashboards needed (patient, provider, coordinator, facilitator, admin)

Neutral: - Clerk multi-org integration is a natural fit for the tenant model - Existing matching engine reusable for MSO doctor matching - Existing prompt abstraction system supports per-tenant prompt variants - Recovery matching reuses existing provider infrastructure (same table, same tenant model) - Recovery outcome data feeds back into the provider quality flywheel — proprietary data moat - Facility readiness scoring differentiates Curaway from competitors — patients see operational reality, not just marketing - Capability verification creates a trust layer that improves with every patient journey

Implementation Phases

Backend (API + Services) — 16-21 weeks

Phase Scope Effort Status (2026-05-13)
0: Foundation Tenants, RBAC, Clerk multi-org, case_shares, redaction, state machine refactor 2-3 weeks 100% — shipped; WS gate (#585) closed via #843 (2026-05-13)
1: Provider Flow Provider onboarding API, forwarding service, quoting endpoints, quote pooling, provider messaging API, facility capability seeding + readiness scoring 3-4 weeks 75% — facility readiness scoring + capability seeding deferred; PR #857 spec drafted for procedure onboarding admin UI (Phase B unblocker)
2: Risk + MSO Risk assessment workflow, MSO matching + consultation + teleconsultation (embedded video + scheduling), reviewer API 3-4 weeks 80% — outcome scoring not activated; Daily.co hot-fix #837 + #833 merged 2026-05-12; teleconsultation_sessions table + video integration deferred; #712 credential verification open
3: Facilitators + Coordinators Delegation consent API, coordinator assignment, ratings/CSAT endpoints, performance metrics API 2-3 weeks 100% — shipped
4: Payments + SLA Token fee, SLA monitoring, invoices, platform fees 2-3 weeks 60% — Stripe wired; Razorpay PR-A (#848) merged 2026-05-12; PR-B + PR-C paused; SLA monitoring deferred
5: Coordination Lifecycle Defined phases (pre-op → follow-up), coordinator workflow tools 2 weeks 100% — shipped
6: Recovery Matching & Tracking Recovery provider profiles, lightweight matcher, milestone check-ins, outcome schema (empty), recovery sub-flow in orchestrator 2-3 weeks 75% — BE 100% (#821-828, #832 merged 2026-05-12), FE cards merged (#264); Phase 2 facility check-in API + outcome scoring activation deferred
7: Transportation transportation_provider_profiles, hospital_transport_partnerships, transport_matcher.py, insurance pre-auth TBD DESIGNED — spec #858 drafted 2026-05-13; gated on SD review + Dr. Naidu's 7 questions

Frontend Portals — 13-18 weeks

The backend phases above deliver APIs only. Each actor type needs a dedicated UI (or role-gated views in a shared app). These can be built in parallel with backend phases once APIs are stable.

Portal Actors Key Screens Effort Depends on
Provider Portal Hospital/clinic staff Case inbox, patient record viewer (redacted), quote builder, quote history, messaging, analytics dashboard, onboarding wizard 4-6 weeks Phase 1 APIs
Coordinator Dashboard Curaway ops staff Assigned patient queue, case timeline, communication hub, task management, CSAT scores, performance metrics 3-4 weeks Phase 3 APIs
Admin Portal Platform + Super admin Tenant management, provider onboarding review, feature flags, system health, user management, audit logs 2-3 weeks Phase 0 APIs
Facilitator Portal External agents Patient delegation requests, case management, commission tracking, ratings, referral links 2-3 weeks Phase 3 APIs
MSO Doctor Portal Second opinion providers Consultation requests, document review interface, teleconsultation scheduling + video embed, consultation report builder 2-3 weeks Phase 2 APIs

Architecture decision: shared app vs separate apps

Recommended: One Next.js/React app with role-based routing (not 5 separate apps). Reasons: - Shared component library (tables, cards, messaging, file viewers) - Single deployment + single auth flow (Clerk multi-org handles role switching) - Provider and MSO portals share 60%+ of components (case viewer, document viewer, messaging) - Admin portal is a superset of coordinator views - Coordinator and facilitator portals share patient management patterns

Route structure: /provider/*, /coordinator/*, /admin/*, /facilitator/*, /mso/*. Clerk org role determines which routes are accessible.

The existing patient-facing app (curaway-health-navigator) remains separate — it's a conversational UI, architecturally distinct from the dashboard-style portals.

Total Platform Estimate — 29-39 weeks

Layer Effort
Backend (Phases 0-6) 16-21 weeks
Frontend portals 13-18 weeks
Total 29-39 weeks

Note: Frontend phases can overlap with backend phases (start portal UI once APIs are 70%+ complete per phase). With parallelization, critical path is ~22-26 weeks. Recovery (Phase 6) can start after Phase 5 or in parallel if coordinator tooling is ready.

Amendment: Admin Console Operational Screens (2026-04-23, Session 61)

Status (2026-05-13): 1 of 4 screens specced — PR #857 drafted spec for procedure onboarding screen (Phase B unblocker). Provider onboarding UI, matching config UI, and Flagsmith admin UI specs are still pending.

The admin portal (Phase B in execution path) needs four operational screens that are central to day-to-day platform operations. These are distinct from the tenant/user CRUD screens already scoped — they're the screens the ops team will live in.

1. Procedure Onboarding Screen

Problem today: Adding a new procedure (e.g., "Robotic Knee Replacement") requires manual DB inserts across 5+ tables: procedures, Neo4j :Procedure nodes, Qdrant embeddings, procedure_capability_requirements, and YAML config updates. No UI, no validation, no audit trail.

What it needs: - Procedure creation wizard: name, ICD-10/CPT codes, category, description, typical duration/recovery - Capability requirements builder: which diagnostic/operational/logistical capabilities does this procedure need? (links to procedure_capability_requirements table) - Neo4j auto-sync: creating a procedure should auto-create the graph node + REQUIRES_CAPABILITY edges - Qdrant auto-embed: procedure description auto-embedded for semantic search - Procedure activation/deactivation toggle (soft delete, not hard) - Audit log of all changes

Open questions: - Do we need procedure versioning? (e.g., requirements change over time) - Should procedures be tenant-scoped or global catalog? - Who can create procedures — only super admin, or also platform admin?

2. Provider Onboarding Screen

Problem today: Provider onboarding is manual — seed scripts, direct DB, and email-based data collection. No structured workflow, no self-service, no progress tracking.

What it needs: - Provider registration form: legal name, country, city, accreditations (JCI, NABH, etc.), contact - Procedure mapping: which procedures does this provider offer? Links to procedure catalog. - Capability self-declaration: provider fills in their diagnostic/operational/logistical capabilities per the 3-tier model - Document upload: licenses, accreditation certificates, insurance - Approval workflow: submitted → under review → approved / rejected (with notes) - Neo4j auto-sync: approved provider creates :Provider node + :OFFERS_PROCEDURE + :HAS_CAPABILITY edges - Qdrant auto-embed: provider profile embedded for semantic matching - Status dashboard: onboarding progress per provider (what's filled, what's pending)

Open questions: - Self-service vs platform-assisted? ADR says "platform-assisted with AI tools" for MVP, self-service later. - Do we need a provider-facing onboarding wizard (in provider portal), or is it admin-only for MVP? - How does Clerk org creation tie into provider approval? Auto-create org on approval?

3. Matching Algorithm Configuration Screen

Problem today: Matching weights are in WeightedScoringV1 with Flagsmith override via matching_weights_v1 flag. Changing weights requires either code deploy or Flagsmith dashboard access. No visibility into how weight changes affect match quality.

Current weights: clinical_fit(0.40) + outcomes(0.20) + cost(0.15) + travel(0.15) + preferences(0.10)

What it needs: - Weight sliders with real-time preview: adjust each dimension, see how top-5 matches change for a sample case - Strategy selector: which scoring strategy is active (WeightedScoringV1, future Graph+Semantic v3, etc.) - Per-procedure weight overrides: cardiac surgery might weight outcomes higher, dental might weight cost higher - A/B test setup: run two weight configurations simultaneously, split by percentage or patient cohort - Audit log: who changed what, when, with before/after snapshots - Rollback: one-click revert to previous configuration

Flagsmith integration question: - Option A: UI writes to Flagsmith API — Admin console calls Flagsmith REST API to update matching_weights_v1 flag value. Flagsmith remains the runtime source of truth. Simple, no new backend. - Option B: Own config table + Flagsmith sync — Admin console writes to a matching_config Postgres table. A sync job pushes to Flagsmith. More complex but adds audit trail and DB backup. - Option C: Bypass Flagsmith for matching — Store weights in DB, read at runtime. Flagsmith only for feature gates, not config values. - Recommendation: Option A for MVP — Flagsmith API is well-documented, weights are already read from Flagsmith at runtime, and we avoid a new config table. Add audit logging in the admin backend (log every Flagsmith API call with before/after).

4. Flagsmith Integration for Admin Console

Problem today: Feature flags are managed via Flagsmith dashboard (external UI). Platform admins need to toggle flags without leaving the admin console. Some flags are operational (e.g., classifier_enabled, gates_v2) and need quick access.

Options: - Option A: Embed Flagsmith UI — Flagsmith offers an embeddable widget. Minimal dev effort but limited customization. - Option B: Flagsmith API integration — Build custom flag management UI using Flagsmith REST API (/api/v1/environments/{key}/featureflags/). Full control over UX, can group flags by domain (intake, matching, billing, etc.), add descriptions, show usage stats. - Option C: Hybrid — Critical operational flags get custom UI with guardrails (confirmation dialogs, impact previews). Long-tail flags use embedded Flagsmith. - Recommendation: Option B for operational flags (10-15 flags the ops team touches regularly), with Flagsmith dashboard as fallback for everything else. Custom UI adds guardrails — e.g., "This flag affects 3,200 active patients. Are you sure?" — that Flagsmith's generic UI can't provide.

Flag categories for admin UI: | Category | Flags | Risk | |----------|-------|------| | Intake flow | classifier_enabled, gates_v2, output_validator_strict | Medium — affects conversation quality | | Matching | matching_weights_v1, procedure_reqs_from_graph, DOCTORS_IN_MATCHING | High — affects match results | | Feature gates | mso_video_embedded, recovery_outcome_scoring_enabled | Low — feature on/off | | Infrastructure | batch_processing_enabled, redis_caching_enabled | High — affects system behavior |

Brainstorm Summary

Screen Backend exists? Frontend exists? Effort Dependencies
Procedure onboarding Partial (models + Neo4j service) No 2-3 weeks Admin portal auth (Phase A)
Provider onboarding Partial (provider_portal.py router) No 3-4 weeks Admin portal auth, Clerk org creation
Matching config Yes (WeightedScoringV1 + Flagsmith) No 1-2 weeks Flagsmith API key in admin env
Flagsmith admin UI No (Flagsmith API is external) No 1-2 weeks Flagsmith API key

Total: 7-11 weeks — fits into Phase B (Admin Portal) of the execution path.

Next step: Design spec for each screen. Matching config + Flagsmith UI are smallest and most immediately useful — consider doing those first within the admin portal work.

Amendment: Patient Tenant Separation (2026-04-30, completed 2026-05-03)

Status: All three phases shipped to production. Tracked under #524 (umbrella, supersedes #267).

  • Phase 1 (additive provisioning): 2026-04-30.
  • Phase 2 (data cutover): executed live 2026-05-03T06:10:56Z. 41 patients / 402 cases / 3,420 FHIR / 476 docs / 200 consents / 30 matches / 4,810 audit_logs / 7,002 patient-classified events / 8,298 llm_usage / 1 org_mapping flipped. Backup at /tmp/curaway-cutover-pre-20260503-113752.dump.
  • Phase 3 (fallback removal): merged via PR #598 after 24h soak. _ORG_TENANT_MAP_FALLBACK patient-org row deleted; default_tenant_id flipped to tenant-curaway-patients (PR #600); Vercel VITE_TENANT_ID updated.

Problem

Patients shared tenant-apollo-001 with the provider Apollo Hospitals. This violated the 7-actor isolation invariant in the original tenant model table above (patients and providers are separate actors, not co-tenants) and blocked the sequenced flag flips tracked in #426.

Audit (2026-04-30) showed the shared tenant held mixed ownership: 39 patients / 359 cases / 3,155 FHIR resources on the patient-owned side; 42 providers / 91 doctors / 40 facilities on the provider-owned side; plus 16,003 events / 4,337 audit_logs / 6,213 LLM-usage rows that need per-row provenance routing in Phase 2.

Decision

Separated via three phases, all now shipped.

Phase 1 (additive, 2026-04-30): Provisioned the destination tenant tenant-curaway-patients (slug curaway-patients, IND/ap-south-1/INR/en mirroring apollo so cutover is a tenant-id swap, not a settings diff). Added to PROTECTED_TENANT_IDS. Zero functional change at this phase.

Phase 2 (cutover, executed 2026-05-03T06:10:56Z): Used the announced ~10-minute maintenance window. Cascade-updated tenant_id for the 7 patient-owned tables, routed mixed tables (events/audit_logs/llm_usage) by row provenance, swapped the tenant_org_mappings row for the patient Clerk org, flipped Vercel VITE_TENANT_ID and backend default_tenant_id in lockstep (PR #600). Two bugs caught by ROLLBACK discipline before commit: (a) missing triage_agent.failed classifier (patched in #599); (b) jsonb_build_object text params needed ::text casts. Cutover report at /tmp/curaway-cutover/cutover_report_20260503T061048Z.json (commit_status: committed). Patient-app redeployed; smoke verified (9/12 set-A pass; 3 failures filed as smoke-script bug #601, no data impact).

Phase 3 (cleanup, merged 2026-05-04 post-soak): Removed the _ORG_TENANT_MAP_FALLBACK patient-org row in rbac_middleware.py via PR #598. Apollo rename (tenant-apollo-001tenant-provider-apollo-001) executed separately under the per-provider cutover (see Amendment: Per-Provider Tenant Cutover).

Migration strategy: brief-downtime cutover

Considered zero-downtime via dual-routing through tenant_org_mappings but the schema's UNIQUE(clerk_org_id) constraint forbids two concurrent rows. Brief downtime for 39 patients is acceptable. Re-evaluate if patient population exceeds 1,000 before Phase 2 executes.

Consequences

  • Adds one tenant row + matching tenant_settings (idempotent).
  • Adds one entry to PROTECTED_TENANT_IDS. The protection guarantee predates any data on the tenant — defensive.
  • Phase 1 is reversible via Alembic downgrade with a guard that aborts if any patient-data row already references the new tenant (defends against rollback after Phase 2).
  • Live spec: docs/specs/patient-tenant-separation-feature.md.

Amendment: Per-Resource Authorization Gate (2026-05-03, #545)

Context. Tenant isolation prevents cross-tenant reads but does not prevent within-tenant patient-A from reading patient-B's case data via direct URL guess. Audit on 2026-05-01 confirmed the leak on production case 678208a7-3f31-4076-a181-de79fc90139f (tenant tenant-apollo-001).

Decision. Every per-case ({case_id}) and per-patient ({patient_id}) HTTP route MUST gate on ownership in addition to tenant. Two FastAPI dependencies in app/middleware/case_access.py own the predicate logic:

  • require_case_access — patient owner / assigned coordinator / active case_shares row (with actor_type = "coordinator") / super_admin / platform_admin.
  • require_patient_access — patient self-access (matched by external_auth_id) / super_admin / platform_admin.

Mismatch returns 404, not 403 — don't leak resource existence.

Bypass roles. Only super_admin and platform_admin. Adding new bypass roles is an ADR-level change.

Coverage as of 2026-05-13:

  • 23 {case_id} HTTP routes gated by require_case_access (post-#587: 4 coordinator-portal + 2 MSO routes added on top of the original 17)
  • 21 {patient_id} HTTP routes gated by require_patient_access
  • 0 {case_id} WebSocket exempts — #585 closed via #843 (merged 2026-05-13); §0 Foundation 100% complete
  • 0 SSE exempts (all SSE retired in Session 90; documents stream migrated to WebSocket with require_patient_access enforcement)

Open exemptions (with tracking issues):

  • Risk-review routes (/api/v1/risk/{case_id}/{assessment,decision,history}) — intentionally exempt (no tracking issue): risk reviewers operate on the queue, not on per-case assignments. Tenant isolation + role permission case:risk_review:read is the authoritative gate. Re-evaluate if a reviewer-assignment column is ever added.

Resolved exemptions (closed; gate now enforced):

  • WebSocket case stream /api/v1/cases/{case_id}/ws — #585 closed via #843 (2026-05-13). JWT-via-query + cross-tenant fallback replaced with full require_case_access enforcement.
  • SSE streams (/messages/stream, /chat/stream, /documents/stream) — #586 closed in Session 90 (2026-05-03). The first two were dead handlers (deleted); /documents/stream migrated to WebSocket at /api/v1/patients/{patient_id}/documents/ws with full require_patient_access enforcement.
  • Coordinator-portal / MSO routes — #587 closed in Session 90 (PR #594). 6 routes (4 coordinator + 2 MSO) now require BOTH @require_permission AND Depends(require_case_access). Without per-case ownership a coordinator with case:read:assigned could previously read any case in the tenant.

Regression guard. tests/test_route_access_scanner.py runs in CI. It walks every FastAPI route, finds {case_id} / {patient_id} paths, and fails the build if a route is missing the gate AND not in the exemption set. New routes without ownership gates cannot merge.

Frontend implications. All patient-app calls under /api/v1/patients/{patient_id}/* and /api/v1/cases/{case_id}/* must include the Clerk session token; the patient app must pass its own tenant header (cross-patient calls within the same tenant will 404).

Amendment: Per-Provider Tenant Cutover (2026-05-03, Session 91)

Context. The tenant model table at the top of this ADR specifies "Provider | N (one each) | Full tenant isolation" but in production, only tenant-apollo-001 existed. All 42 seeded providers, 91 doctors, and 40 facilities co-tenanted on Apollo's tenant_id, violating the per-provider isolation invariant and blocking self-service provider onboarding (every new provider required a code deploy across 2 repos to update _ORG_TENANT_MAP_FALLBACK).

Decision. Cut over to per-provider tenants. Each provider gets tenant-provider-{slug} (Apollo: tenant-provider-apollo-001).

Cutover executed 2026-05-03T09:26:17Z:

  • 42 new tenants tenant-provider-{slug} created
  • 91 doctors + 40 facilities migrated to their respective provider tenants
  • Old tenant-apollo-001 deleted (correctness gate enforced)
  • 4 user_roles + 2 case_shares.target + 1 case_shares.source + 10,624 events + 1 audit_log moved to renamed Apollo tenant
  • Apollo Clerk org (pk_live + pk_test) resolves via public tenant-map endpoint to tenant-provider-apollo-001
  • 0 patient/case data touched (already separated under Amendment: Patient Tenant Separation)

Verification post-cutover:

  • 42 distinct provider tenants in providers table
  • 0 doctor tenant_id mismatches; 0 facility tenant_id mismatches
  • Old tenant-apollo-001 row gone
  • All 7 actors on own tenants per the table at top of this ADR — fully realized at the data layer

Shipped via:

  • PR #604 — spec (442 lines) with full audit + reviewer findings addressed
  • PR #605 (B1) — doctor_service accepts Optional tenant_id for cross-tenant reads (matching engine fix)
  • PR #606 (B2) — _TTL_15MIN cache TTL + POST /api/v1/admin/system/bust-storefront-cache
  • PR #607 (B3) — single-call POST /api/v1/admin/providers onboarding API; drops provider_clerk_integration flag
  • PR #608 (B4) — cutover script + 5 in-flight bug fixes caught by ROLLBACK discipline (NOT NULL on tenants.contact_email, tenant_settings.data_residency_region/video_provider; tenants.id varchar(36) cap; case_shares.target_provider_id absent; conn.rollback() wiping inserts)
  • PR #609 (B5) — _ORG_TENANT_MAP_FALLBACK Apollo entry → per-provider tenant

Pending follow-ups:

  • F1 — frontend tenant-map fallback seed update (packages/shared-core/src/tenant-map.ts)
  • F3 — admin-app "Providers" page (2-day frontend effort; backend onboarding API #607 ships today)
  • CI grep guard banning literal tenant-apollo-001 in production code
  • Sweep of low-priority hardcoded refs in seed scripts/docstrings/OpenAPI examples

Backups: /tmp/curaway-cutover-providers-pre-20260503-145215.dump (7.3 MB), /tmp/curaway-cutover/provider_cutover_report_20260503T092606Z.json.

Amendment: Transportation Provider Tier — Acknowledged Gap, Phase 7 Reservation (2026-05-06)

Problem

ADR-0018 today does not have a standalone transportation / ambulance provider tier. Transport is currently modeled as:

  1. Hospital capabilityairport_transfer is one of the capability_code values on provider_capabilities (§J). The hospital declares whether it provides airport pickup; if not, the patient is on their own.
  2. Recovery-facility attributerecovery_provider_profiles.transport_to_hospital: BOOL (§K) flags whether the recovery facility shuttles the patient to/from the surgical hospital.
  3. Coordinator-managed task — the coordinator role holds the case:transport:manage permission and arranges any unmet transport need ad hoc per case.

This works for ~80% of MVP cases (patient flies in, hospital sends a car, returns home post-recovery). It does NOT cover:

  • Medical evacuation (air ambulance for emergency transfers across borders)
  • Ground ambulance with medical staffing (non-ambulatory patients needing supervised inter-facility / airport transfer)
  • Insurance-integrated medical transport (many patients have separately-reimbursed transport coverage)
  • Quality control — coordinator-arranged transport has no vetting bar; patient may be in an unstaffed taxi when they need an ICU-level transfer

Decision

Acknowledge the gap, defer the build. Curaway's MVP launches with the current capability-flag + coordinator model. A standalone transportation-provider sub-type is reserved for Phase 7 of ADR-0018 (post-MVP, post-Series-A timeframe). Tracking issue: #713.

When Phase 7 ships, the design will mirror the recovery-provider pattern from §K — transport is a new provider_type enum value on the existing providers table, not a new top-level actor type in the 7-actor model. Each transport-provider org is its own tenant via standard multi-tenancy; the actor-type infrastructure (auth, storefront, ratings, onboarding) is reused unchanged. The Phase 7 build adds:

  • New provider_type enum value transportation on the existing providers table (mirrors recovery_rehab / recovery_accommodation from §K)
  • Tables transportation_provider_profiles + hospital_transport_partnerships
  • Three-tier model: hospital-recommended → Curaway-partnered → marketplace
  • Lightweight scorer (transport_matcher.py) ranking by location, medical staffing required, ETA, cost
  • Insurance pre-auth integration (carrier ID, claim format)
  • Patient-app surface: post-match transport booking option
  • Admin-app oversight + quality vetting (driver licensing, vehicle inspection)

Rationale for deferral

  • Capability flags handle the common case. When a hospital includes airport transfer, the patient never sees the absence of a transport tier.
  • Marketplace investment is premature pre-launch. A transport-provider marketplace requires onboarding 5+ providers per geography to be useful; that's 4-8 weeks of provider-relations work that pulls focus from clinical matching quality.
  • Insurance integration depends on payer relationships Curaway doesn't yet have. Adding it pre-launch would build infrastructure with no consumers.
  • Recovery (Phase 6) is the higher-value adjacent build. Patients ask for "where do I sleep after surgery" before "who drives me to the airport."

Mapping during MVP and after Phase 7

Concept MVP (today) Phase 7
Airport pickup Hospital capability flag (airport_transfer) Decoupled — hospital recommends OR patient picks from transport-provider list
Recovery → hospital shuttle recovery_provider_profiles.transport_to_hospital: BOOL Replaced by hospital_transport_partnerships rows
Emergency medical transfer Ad hoc via coordinator's case:transport:manage Matched + booked through transport_matcher.py
Quality vetting None (informal) Driver licensing + vehicle inspection records, surfaced in admin app
Insurance Patient pays out-of-pocket, claims separately Pre-auth flow with carrier integration

Consequences

  • Operational disclaimer: while the MVP is live, Curaway's coordinators must not represent transport as a vetted Curaway service. Patient-facing language should say "your hospital can arrange airport pickup" or "we'll connect you with a local provider", never "Curaway-vetted ambulance."
  • Capability disclosure spec (Phase 2b — provider_capabilities table) should include airport_transfer as a first-class capability code with sub-types (ambulance_ground, ambulance_air, private_car_medical, airport_meet_assist) so the data shape is forward-compatible with the Phase 7 transport tier. This is a small additive decision in Phase 2b — it costs nothing now, prevents a schema migration when Phase 7 lands.
  • case:transport:manage permission stays as the operational stand-in until Phase 7. Coordinators continue to handle exceptional cases (medical evacuation, non-ambulatory patients) manually.

Cross-references

  • Tracking issue: #713 — Phase 7 transportation provider tier
  • §J Facility Capability Disclosure (current capability-flag model)
  • §K Post-Op Recovery Matching (pattern Phase 7 will mirror)
  • ADR-0018 audit 2026-05-05 — coverage gap that surfaced this amendment

Amendment: MSO Teleconsultation video_provider column dropped (2026-05-06)

Context

ADR-0018 §G originally specified video_provider ENUM(daily, 100ms, external_link) on the teleconsultation_sessions table. ADR-0025 (2026-05-05) locked Daily.co as the sole MSO video provider.

Decision

Phase 2a.1 implementation dropped the video_provider column from teleconsultation_sessions. Sessions are Daily.co-only by design. If a future ADR re-introduces a second video provider, the column will be re-added via Alembic migration.

Consequences

  • Schema simpler; no enum to maintain
  • A future provider switch requires a migration + amendment to ADR-0025
  • The PII boundary filter at app/services/video_room_pii_filter.py is Daily-specific; a second provider would need its own boundary filter

Cross-references

  • ADR-0025 — Daily.co provider lock
  • app/models/teleconsultation.py — current schema
  • migrations/b1c2d3e4f5a7…_create_teleconsultation_sessions.py — original Daily-only migration

References

  • Microservices readiness: docs/architecture/20-microservices-readiness.md
  • DAO layer: ADR-0016
  • Multicurrency: ADR-0017
  • Upstash Workflow: ADR-0014
  • Mediglobe comparative study PDFs: ~/Code/Curaway/*.pdf (reference for quote format)
  • Phase 0 spec: docs/specs/ai-steer/multi-tenancy-foundation-steer.md (to be written)
  • Patient tenant separation spec: docs/specs/patient-tenant-separation-feature.md (#267)