Matching Engine¶
Overview¶
The Matching Engine is Curaway's core differentiator -- the system that connects patients with the right providers and doctors based on clinical needs, preferences, outcomes data, and logistics. It uses a pluggable strategy pattern that allows multiple matching algorithms to coexist, be tested in shadow mode, and be swapped at runtime via feature flags.
ADR-0026 — Store contract (2026-05-07)
Postgres is the canonical source of truth. Neo4j is a read-only projection. All admin CRUD writes go to Postgres first; a QStash-driven projection worker upserts the corresponding Neo4j subgraph asynchronously (≤30s latency). The on-demand POST /api/v1/admin/graph/rebuild endpoint rebuilds any subgraph from current Postgres state without SSH access. See docs/adr/0026-matching-framework-architecture.md.
Parameter registry — Phase 1 PR-A (this PR)
ADR-0026 introduces a 147-parameter registry at config/matching/parameters/<domain>.yaml. Phase 1 PR-A lands the registry, loader, and CI guard (12 YAML files, 147 entries, 14 currently active). The registry-driven scoring engine itself is gated behind the matching_engine_v2 Flagsmith flag and lands in PR-B. With the flag off, the legacy GraphEnhancedWeightedV1 path remains canonical — the 7 scoring dimensions below reflect that production-default behavior. See docs/reference/matching-parameters.md for the live registry view, docs/runbook/matching-engine.md for rollout, and app/services/matching/registry.py for the loader.
Strategy Pattern¶
Architecture¶
graph TD
API[POST /api/v1/cases/{id}/match] --> Router[Strategy Router]
Router --> Flag{Flagsmith: matching_strategy_version}
Flag -->|"v2.1"| GSW[Graph+Semantic Weighted v2.1]
Flag -->|"v1.0"| WR[Weighted Rules v1.0]
Flag -->|"v1.5"| AE[Agent-Enhanced v1.5]
Flag -->|"v2.0"| ML[ML Ranking v2.0]
Flag -->|"v3.0"| HY[Hybrid v3.0]
style Router fill:#008B8B,color:#fff
style GSW fill:#FF7F50,color:#fff
style AE fill:#4A90D9,color:#fff
Strategy Interface¶
Every matching strategy implements a common interface:
class MatchingStrategy(ABC):
"""Base class for all matching strategies."""
@abstractmethod
async def match(
self,
case: Case,
clinical_data: ClinicalData,
preferences: PatientPreferences,
tenant_id: str,
) -> MatchResult:
"""Execute matching and return scored results."""
...
@abstractmethod
def get_version(self) -> str:
"""Return strategy version identifier."""
...
@abstractmethod
def get_scoring_dimensions(self) -> list[str]:
"""Return list of scoring dimensions used."""
...
Available Strategies¶
| Strategy | Version | Status | Description |
|---|---|---|---|
| Graph+Semantic Weighted | v2.1 | Active (default) | Neo4j traversal + Qdrant semantic + weighted scoring |
| Weighted Rules | v1.0 | Legacy | Pure rule-based scoring without graph or semantic components |
| Agent-Enhanced | v1.5 | Feature-flagged | LLM-enhanced reranking on top of v2.1 results |
| ML Ranking | v2.0 | Stub | Future: learned-to-rank model from historical outcomes |
| Hybrid | v3.0 | Stub | Future: ensemble of v2.1 + ML + agent signals |
Strategy Selection
The active strategy is selected per-tenant via Flagsmith flag matching_strategy_version. This allows different tenants to run different strategies, and enables gradual rollout of new strategies.
Active Strategy: Graph+Semantic Weighted v2.1¶
Scoring Dimensions¶
The active strategy scores providers across 7 weighted dimensions:
| Dimension | Weight | Source | Description |
|---|---|---|---|
clinical_relevance |
0.25 | Neo4j + FHIR | How well the provider's offerings match the patient's clinical needs |
outcome_score |
0.20 | Neo4j OFFERS metadata | Success rate, complication rate, volume-based confidence |
cost_score |
0.15 | Neo4j OFFERS metadata | Cost relative to patient budget and market average |
semantic_match |
0.10 | Qdrant cosine similarity | Semantic similarity between patient needs and provider profile |
travel_logistics |
0.10 | Computed | Visa requirements, flight connections, time zone difference |
accreditation |
0.10 | Neo4j HAS_ACCREDITATION | JCI, NABH, and other accreditation presence |
patient_preferences |
0.10 | Patient profile | Language match, country preference, dietary accommodation |
STRATEGY_WEIGHTS = {
"clinical_relevance": 0.25,
"outcome_score": 0.20,
"cost_score": 0.15,
"semantic_match": 0.10,
"travel_logistics": 0.10,
"accreditation": 0.10,
"patient_preferences": 0.10,
}
Weights Must Sum to 1.0
The system validates that all weights sum to 1.0 at startup. If weight redistribution is applied (due to missing data), the redistributed weights are also validated.
Execution Flow¶
sequenceDiagram
participant API
participant Router as Strategy Router
participant Neo as Neo4j
participant QD as Qdrant
participant Scorer as Scoring Engine
participant DB as PostgreSQL
API->>Router: match(case, clinical_data, preferences)
Router->>Neo: Cypher: Find providers offering required procedures
Neo-->>Router: Candidate providers with OFFERS metadata
Router->>QD: Semantic search: patient needs vs. provider vectors
QD-->>Router: Cosine similarity scores
Router->>Scorer: Score candidates across 7 dimensions
Scorer->>Scorer: Apply weights, normalize scores
Scorer->>Scorer: Weight redistribution (if missing data)
Scorer-->>Router: Ranked results with per-dimension scores
Router->>DB: Store match_results
Router-->>API: MatchResult with explanations
Graph-Enhanced Flow Detail¶
Step 1: Neo4j Traversal
MATCH (c:Condition {code: $condition_code})-[:REQUIRES]->(proc:Procedure)
<-[offers:OFFERS]-(prov:Provider)-[:LOCATED_IN]->(loc:Location)
WHERE prov.tenant_id = $tenant_id
AND prov.is_active = true
OPTIONAL MATCH (prov)-[:HAS_ACCREDITATION]->(acc:Accreditation)
RETURN prov, proc, offers, loc, COLLECT(DISTINCT acc) AS accreditations
Step 2: OFFERS Metadata Extraction
The OFFERS relationship carries rich metadata that feeds directly into scoring:
offers_data = {
"cost_usd": relationship["cost_usd"],
"annual_volume": relationship["annual_volume"],
"success_rate": relationship["success_rate"],
"average_los_days": relationship["average_los_days"],
"wait_time_weeks": relationship["wait_time_weeks"],
"package_includes": relationship["package_includes"],
}
Step 3: Scoring with Graph Data
Each dimension scorer receives the full context:
async def score_outcome(
provider: Provider,
offers_data: dict,
procedure_code: str,
) -> float:
"""Score provider outcomes for the specific procedure."""
success_rate = offers_data.get("success_rate", 0)
volume = offers_data.get("annual_volume", 0)
# Volume-based confidence: more procedures = more reliable data
volume_confidence = min(volume / 200, 1.0) # Caps at 200/year
# Weighted combination
raw_score = (success_rate / 100) * 0.7 + volume_confidence * 0.3
return round(raw_score, 4)
Doctor-Level Scoring (Session 26)¶
DOCTORS_IN_MATCHING Flag¶
When the DOCTORS_IN_MATCHING Flagsmith flag is enabled, the matching engine extends scoring to individual doctors within each matched provider.
graph TD
A[Provider-Level Match] --> B{DOCTORS_IN_MATCHING?}
B -->|Yes| C[Fetch Affiliated Doctors]
C --> D[Score Each Doctor]
D --> E[Language Concordance]
D --> F[Procedure-Specific Metrics]
D --> G[Data Completeness Factor]
E --> H[Doctor-Enriched Results]
F --> H
G --> H
B -->|No| I[Provider-Only Results]
style B fill:#008B8B,color:#fff
style H fill:#FF7F50,color:#fff
Language Concordance Scoring¶
Language concordance between patient and doctor is scored across 6 tiers:
| Tier | Score | Definition | Example |
|---|---|---|---|
| Native | 1.00 | Doctor speaks patient's native language natively | Hindi patient, Hindi-native doctor |
| Fluent | 0.85 | Doctor is fluent in patient's language | English patient, English-fluent doctor in India |
| Professional | 0.70 | Professional working proficiency | Arabic patient, doctor with Arabic professional cert |
| Conversational | 0.50 | Basic conversational ability | Turkish patient, doctor with basic Turkish |
| Interpreter Available | 0.30 | Provider offers interpretation services | Thai patient, hospital has Thai medical interpreter |
| None | 0.00 | No language overlap, no interpretation | No common language |
def score_language_concordance(
patient_language: str,
doctor_languages: list[dict],
provider_language_services: dict,
) -> float:
"""Score language match between patient and doctor."""
for lang in doctor_languages:
if lang["language"].lower() == patient_language.lower():
proficiency_scores = {
"native": 1.00,
"fluent": 0.85,
"professional": 0.70,
"conversational": 0.50,
}
return proficiency_scores.get(lang["proficiency"], 0.50)
# Check provider-level interpretation services
interpreters = provider_language_services.get("medical_interpreters", [])
if patient_language in interpreters:
return 0.30
return 0.00
Data Completeness Confidence Factor¶
Doctor scores are adjusted by a confidence factor derived from data completeness:
def apply_completeness_factor(
raw_score: float,
data_completeness: dict,
) -> float:
"""Adjust score based on data completeness confidence."""
overall_completeness = data_completeness.get("overall", 0.5)
# Minimum floor of 0.5 to avoid penalizing new doctors too heavily
confidence = max(overall_completeness, 0.5)
# Blend raw score toward 0.5 (neutral) based on missing data
adjusted = raw_score * confidence + 0.5 * (1 - confidence)
return round(adjusted, 4)
Why Blend Toward 0.5?
When data is incomplete, we don't want to assume the doctor is either great or terrible. Blending toward 0.5 (neutral) means incomplete profiles are ranked in the middle, not at the top or bottom.
Template-Based Match Reasoning¶
Doctor match results include human-readable reasoning generated from templates:
REASONING_TEMPLATES = {
"high_volume_specialist": (
"Dr. {name} has performed {volume} {procedure} procedures annually "
"with a {success_rate}% success rate, placing them in the top tier "
"of specialists at {provider_name}."
),
"language_concordance": (
"Dr. {name} speaks {language} at {proficiency} level, enabling "
"direct communication without interpretation."
),
"technique_match": (
"Dr. {name} specializes in {technique} {procedure}, which is the "
"recommended approach for your specific condition profile."
),
}
Weight Redistribution¶
When a scoring dimension has insufficient data for a provider, its weight is redistributed proportionally to the other dimensions:
def redistribute_weights(
base_weights: dict[str, float],
available_dimensions: set[str],
) -> dict[str, float]:
"""Redistribute weights from unavailable dimensions."""
unavailable = set(base_weights.keys()) - available_dimensions
if not unavailable:
return base_weights
total_unavailable_weight = sum(base_weights[d] for d in unavailable)
total_available_weight = sum(base_weights[d] for d in available_dimensions)
redistributed = {}
for dim in available_dimensions:
original = base_weights[dim]
share = original / total_available_weight
redistributed[dim] = original + (total_unavailable_weight * share)
return redistributed
Example: If travel_logistics data is missing (weight 0.10):
| Dimension | Original Weight | Redistributed Weight |
|---|---|---|
| clinical_relevance | 0.25 | 0.278 |
| outcome_score | 0.20 | 0.222 |
| cost_score | 0.15 | 0.167 |
| semantic_match | 0.10 | 0.111 |
| -- | ||
| accreditation | 0.10 | 0.111 |
| patient_preferences | 0.10 | 0.111 |
Shadow Mode and A/B Testing¶
Shadow Mode¶
New strategies can run in shadow mode alongside the active strategy. Shadow mode executes the new strategy on every match request but discards the results -- only logging them for comparison.
async def match_with_shadow(
case: Case,
clinical_data: ClinicalData,
preferences: PatientPreferences,
tenant_id: str,
) -> MatchResult:
"""Run active strategy + optional shadow strategy."""
# Active strategy (returned to patient)
active_result = await active_strategy.match(case, clinical_data, preferences, tenant_id)
# Shadow strategy (logged, not returned)
shadow_flag = await flagsmith.get_flag("matching_shadow_strategy", tenant_id)
if shadow_flag:
shadow_strategy = get_strategy(shadow_flag)
shadow_result = await shadow_strategy.match(case, clinical_data, preferences, tenant_id)
await log_shadow_comparison(active_result, shadow_result, case.id)
return active_result
A/B Testing via Flagsmith + PostHog¶
graph LR
A[Patient Request] --> B{Flagsmith A/B Split}
B -->|Group A: 80%| C[Strategy v2.1]
B -->|Group B: 20%| D[Strategy v1.5]
C --> E[PostHog: Track Outcomes]
D --> E
E --> F[Analyze: Conversion, Satisfaction, Time-to-Match]
style B fill:#008B8B,color:#fff
style E fill:#FF7F50,color:#fff
A/B tests track these metrics in PostHog:
| Metric | Description | Target |
|---|---|---|
match_click_through |
% of patients who click on a matched provider | > 60% |
consultation_booked |
% of matches that result in a consultation booking | > 25% |
time_to_decision |
Time from match presentation to patient action | < 48 hours |
patient_satisfaction |
Post-match survey score (1-5) | > 4.0 |
Procedure Requirements API¶
Overview¶
Procedures have specific document requirements (e.g., hip replacement requires recent X-rays, blood work, cardiac clearance). The Procedure Requirements API manages these requirements with support for provider-specific overrides.
Base Requirements¶
class ProcedureRequirement(BaseModel):
"""A document requirement for a procedure."""
id: UUID
tenant_id: UUID
procedure_code: str # CPT code
requirement_type: str # "diagnostic", "lab", "imaging", "clearance"
name: str # "Complete Blood Count"
description: str # Detailed description for matching
max_age_days: int # Maximum acceptable age of the document
is_mandatory: bool = True
alternatives: list[str] = [] # Alternative acceptable documents
Provider-Specific Overrides¶
Providers can override base requirements (e.g., requiring additional tests or accepting older results):
class ProviderRequirementOverride(BaseModel):
"""Provider-specific override to a base requirement."""
id: UUID
tenant_id: UUID
provider_id: UUID
procedure_requirement_id: UUID
max_age_days_override: Optional[int] # Provider accepts older docs
is_mandatory_override: Optional[bool] # Provider makes it optional
additional_notes: Optional[str] # Provider-specific instructions
additional_requirements: list[dict] # Extra tests this provider needs
API Endpoints¶
| Endpoint | Method | Description |
|---|---|---|
/api/v1/procedures/{code}/requirements |
GET | List base requirements for a procedure |
/api/v1/providers/{id}/procedures/{code}/requirements |
GET | List requirements with provider overrides |
/api/v1/cases/{id}/requirements/status |
GET | Check which requirements are fulfilled for a case |
Match Result Schema¶
class MatchResult(BaseModel):
"""Complete match result for a case."""
id: UUID
case_id: UUID
tenant_id: UUID
strategy_version: str
strategy_weights: dict[str, float]
providers: list[ProviderMatch]
doctors: Optional[list[DoctorMatch]] # Only if DOCTORS_IN_MATCHING
executed_at: datetime
execution_time_ms: int
shadow_strategy_version: Optional[str]
class ProviderMatch(BaseModel):
"""Individual provider match with scores."""
provider_id: UUID
rank: int
overall_score: float # 0.0 - 1.0
dimension_scores: dict[str, float] # Per-dimension scores
weights_used: dict[str, float] # Actual weights (after redistribution)
reasoning: str # Human-readable explanation
strengths: list[str]
considerations: list[str]
class DoctorMatch(BaseModel):
"""Individual doctor match within a provider."""
doctor_id: UUID
provider_id: UUID
rank: int
overall_score: float
language_concordance: float
procedure_metrics: dict
data_completeness_score: float
reasoning: str
Scoring Triple: PFS / HSS / FMS (Wave 1)¶
The matching engine's raw 0-1 score is one input into a three-score system that drives matching gates, provider ranking, and conversion prediction. All three scores are pure deterministic functions -- no LLM calls, no DB queries at scoring time.
How the Three Scores Relate¶
graph LR
subgraph "Patient Side"
L1[Layer 1: Intent] --> PFS
L2[Layer 2: Medical] --> PFS
L3[Layer 3: Travel] --> PFS
L4[Layer 4: Logistics] --> PFS
L5[Layer 5: Financial] --> PFS
RISK[Risk Assessor] --> PFS
end
subgraph "Hospital Side"
ME["Matching Engine<br/>(7 weighted dimensions → 0-1)"] --> HSS
COST[Patient Pricing Band] --> HSS
FAC[Facility Readiness] --> HSS
end
PFS --> FMS
HSS --> FMS
PREF[Preference Alignment] --> FMS
HIST[Historical Conversion] --> FMS
FMS --> BAND{"high_conversion<br/>moderate<br/>low<br/>unlikely"}
style PFS fill:#008B8B,color:#fff
style HSS fill:#FF7F50,color:#fff
style FMS fill:#004D4D,color:#fff
| Score | Question it answers | Range | Source |
|---|---|---|---|
| PFS (Patient Fitness Score) | Is the patient ready for medical travel? | 0-100 → band | Intake layers + risk assessor |
| HSS (Hospital Suitability Score) | Is this hospital right for this patient? | 0-100 per provider | Matching engine + cost gate + facility penalty |
| FMS (Final Matching Score) | Will this match convert? | 0-1 (sigmoid) → band | PFS + HSS + preferences + historical data |
PFS: Patient Fitness Score¶
PFS is computed from four weighted components, each 0-100:
# config/scoring.yaml
pfs:
weights:
medical: 0.45 # Risk factors + missing mandatory docs
travel: 0.25 # Transport tier (T1-T4)
logistics: 0.15 # Passport, visa, timeline feasibility
intent_modifier: 0.15 # Decision stage + emotional readiness
Medical component starts at 100 and deducts per risk item:
| Risk factor | Penalty |
|---|---|
| Blocking (e.g., INR > 1.5, HbA1c >= 9%) | -25 |
| High severity (e.g., CKD, heart failure) | -10 |
| Moderate severity | -5 |
| Missing mandatory document | -5 each |
| Max total penalty | -75 |
Travel component maps transport tier to readiness:
| Tier | Score | Definition |
|---|---|---|
| T1 | 100 | Standard -- mobile, no dependencies |
| T2 | 75 | Assisted -- wheelchair or mild impairment |
| T3 | 50 | Medical -- oxygen, recent hospitalization |
| T4 | 25 | Critical -- bedridden, ICU-level |
Auto-overrides apply regardless of calculated score: continuous oxygen forces T3+, bedridden or hospitalized within 7 days forces T4.
PFS bands determine what happens next:
| Band | Range | Action |
|---|---|---|
ready |
80-100 | Proceed to matching |
conditionally_ready |
60-79 | Matching + flag gaps for coordinator |
needs_attention |
40-59 | Matching blocked, remediation needed |
not_ready |
0-39 | Honest redirect, no matching |
Code: app/services/pfs_scorer.py
HSS: Hospital Suitability Score¶
HSS is a post-matching reframing layer. It does not replace the matching engine -- it wraps it with two clinical gates the raw weighted score cannot capture.
Why HSS Exists¶
The raw matching engine treats cost as one of 7 dimensions (15% weight). A provider could score 0.90 overall despite being 200% over the patient's budget -- high clinical and outcome scores mask the cost problem. HSS fixes this by making cost a gate, not a weight.
Computation¶
# Step 1: Scale match_score from 0-1 to 0-100
base_hss = match_score * 100.0
# Step 2: Apply cost multiplier based on Patient Pricing Band (PV) alignment
if cost_within_pv:
cost_multiplier = 1.0 # Full score
elif cost_in_soft_ceiling: # Within 120% of PV max
cost_multiplier = 0.5 # Partial score
else:
cost_multiplier = 0.0 # HARD GATE — outside PV
# Step 3: Apply facility readiness penalty
facility_penalty = 0.15 if critical_facility_gap else 0.0
# Step 4: Combine
HSS = (base_hss * cost_multiplier) - (facility_penalty * 100)
HSS = clamp(0, 100)
HSS in Action¶
| Scenario | Match score | HSS | Why |
|---|---|---|---|
| Great fit, within budget | 0.85 | 85 | 85 x 1.0 = 85 |
| Great fit, 110% of budget | 0.90 | 45 | 90 x 0.5 = 45 (soft ceiling) |
| Great fit, way over budget | 0.92 | 0 | 92 x 0.0 = 0 (hard gate) |
| Good fit, missing ICU | 0.80 | 65 | 80 - 15 = 65 (facility penalty) |
HSS Configuration¶
# config/scoring.yaml
hss:
cost_alignment:
soft_ceiling_multiplier: 1.2 # 120% of PV max
soft_ceiling_score: 0.5 # Partial score for soft-ceiling match
outside_range_score: 0.0 # Fully outside PV = gated out
facility_readiness:
critical_gap_penalty: 0.15 # Subtract 15 points from HSS
Code: app/services/hss_scorer.py
FMS: Final Matching Score¶
FMS combines PFS (patient readiness) and HSS (hospital suitability) via a sigmoid function to produce a conversion probability. This is the score that drives coordinator prioritization and follow-up cadence.
Computation¶
# Normalize inputs to 0-1
pfs_n = pfs / 100.0
hss_n = hss / 100.0
# Weighted linear combination
linear = (
0.30 * pfs_n # Patient readiness
+ 0.35 * hss_n # Hospital fit
+ 0.15 * preference_alignment # Cultural/dietary/language match (0-1)
+ 0.20 * historical_conversion # Learned signal from feedback_records (0-1)
+ bias # -0.5 (shifts sigmoid center)
)
# Sigmoid produces 0-1 probability
FMS = sigmoid(linear * 4)
Data moat: The historical_conversion component defaults to 0.5 (neutral) until enough feedback_records accumulate. As the platform processes more cases, this signal becomes learned from actual conversion outcomes -- a compounding advantage.
FMS Bands¶
| Band | Threshold | Action |
|---|---|---|
high_conversion |
>= 0.7 | Priority follow-up |
moderate |
>= 0.4 | Standard cadence |
low |
>= 0.1 | Nurture |
unlikely |
< 0.1 | Review for mismatch |
Code: app/services/fms_scorer.py
Where the Scores Are Computed in the Pipeline¶
Patient intake (Triage Agent, 5 layers)
↓
PFS computed (per turn, from layer state + risk items)
↓
PFS band gates matching:
not_ready → no matching
needs_attention → matching blocked
conditionally_ready / ready → proceed
↓
Matching Engine (GraphEnhancedWeightedV1 → 0-1 per provider)
↓
HSS computed (per provider: cost gate + facility penalty → 0-100)
↓
FMS computed (PFS × HSS × preferences × history → 0-1 sigmoid)
↓
Results ranked by FMS, displayed with HSS breakdown
Configuration and A/B Testing¶
All weights and thresholds live in config/scoring.yaml with Flagsmith overrides per top-level key (scoring_pfs, scoring_hss, scoring_fms). This enables runtime A/B testing of scoring parameters without deploys.
Code: app/services/scoring_config.py
Sub-Parameters Feeding the Scoring Triple¶
The 7 matching engine dimensions draw from ~145 sub-parameters across four data sources:
| Source | Parameter count | Examples |
|---|---|---|
| FHIR resources | ~40 | ICD-10 conditions, CPT procedures, LOINC observations, ATC medications, allergies |
| Provider capabilities | ~35 | Procedures offered (cost, volume, success rate), specialties, accreditations, languages |
| Patient attributes | ~50 | Demographics, preferences (doctor gender, dietary, religious, language), travel fitness, risk factors (30+ rules), comorbidities |
| Facility properties | ~20 | ICU, surgical suites, infection control, equipment, staffing, recovery accommodations |
These parameters flow through three stages: matching engine dimensions (7 weighted scores), HSS gates (cost + facility), and FMS combination (sigmoid → conversion probability).