Skip to content

Security Audit & Remediation Plan

Audit Date: 2026-04-01 Audited By: Automated agent team (3 parallel audits) Scope: FastAPI routes, LangGraph agent pipeline, test coverage


Executive Summary

Area Risk Level Critical High Medium
Endpoint Security HIGH 3 2 5
Agent Pipeline HIGH 2 3 5
Test Coverage MEDIUM 5 gaps 6 gaps 9 gaps

Sprint 1: Security Hardening (Immediate)

1.1 SQL Injection in Public API (CRITICAL)

Location: app/routers/public.py lines 311, 315, 319, 544, 547, 1066, 1087, 1106

Problem: User input directly interpolated into ILIKE queries via f-strings.

# VULNERABLE
Provider.name.ilike(f"%{query_text}%")

# FIX
from sqlalchemy import literal
Provider.name.ilike(literal(f"%{query_text}%"))
# Or use bindparam

Affected endpoints: All 9 public endpoints with filtering/search.

Effort: 2h


1.2 Prompt Injection via System Prompts (CRITICAL)

Location: app/agents/llm_conversation.py lines 164-167

Problem: patient_context (user-derived data) format-interpolated into system prompt.

# VULNERABLE
system = CURAWAY_SYSTEM_PROMPT.format(
    phase_context=phase_context,
    patient_context=patient_context,  # User data in system prompt
)

# FIX: Use XML delimiters to separate system instructions from user data
system = CURAWAY_SYSTEM_PROMPT + f"\n\n<patient_data>\n{patient_context}\n</patient_data>"

Also affected: case_orchestrator.py line 162 (procedure_name in prompt).

Effort: 3h


1.3 Guardrail Classifier Fails Open (HIGH)

Location: app/services/message_classifier.py lines 57-84

Problem: When Anthropic API is down, all messages pass through as "on_topic".

# CURRENT (fails open)
except Exception:
    return {"category": "on_topic", "action": "pass_through"}

# FIX (fail closed)
except Exception:
    return {"category": "unknown", "action": "redirect", "response_key": "service_unavailable"}

Effort: 1h


1.4 Health Endpoint Information Disclosure (HIGH)

Location: app/main.py lines 894-904

Problem: /health and /health.json expose database stats, Neo4j info, LLM costs, Qdrant collections — all unauthenticated.

Fix: Require X-Tenant-ID or X-Internal-Secret on /health and /health.json. Keep /ready public (returns only status).

Effort: 1h


1.5 CORS Configuration (HIGH)

Location: app/main.py lines 167-181

Problems: - allow_methods=["*"] allows all HTTP methods (TRACE, CONNECT, etc.) - allow_origin_regex=r"https://.*\.vercel\.app" too broad - Localhost origins in production

Fix:

allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"],
allow_origin_regex=r"https://curaway-health-navigator.*\.vercel\.app",
# Remove localhost in production

Effort: 1h


1.6 Rate Limiting (HIGH)

Location: All routers — no rate limiting exists.

Fix: Add slowapi middleware:

from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address, storage_uri=settings.upstash_redis_url)

@router.get("/search")
@limiter.limit("60/minute")
async def search(...):

Priority endpoints: /api/v1/public/search, all POST endpoints.

Effort: 4h


Sprint 2: Reliability & Privacy

2.1 Workflow State Race Condition (HIGH)

Location: app/agents/case_orchestrator.py lines 39-407

Problem: Two concurrent requests to same case can interleave state updates. No locking.

Fix: Add SELECT FOR UPDATE when reading case for orchestrator:

case = await db.execute(
    select(Case).where(Case.id == case_id).with_for_update()
)

Effort: 3h


2.2 PII in LLM Prompts (HIGH)

Location: app/agents/llm_conversation.py lines 50, 164

Problem: Patient demographics (name, DOB, country) included in patient_context sent to Claude.

Fix: Redact PII before building patient_context. Use anonymized identifiers:

# Instead of: "Patient: John Smith, DOB 1985-03-15"
# Use: "Patient: [REDACTED], Age: 39"

Effort: 3h


2.3 PyMuPDF Timeout Protection (MEDIUM-HIGH)

Location: app/integrations/pymupdf_client.py lines 16-45

Problem: No timeout on fitz.open() — malicious PDF could hang indefinitely.

Fix:

import signal
def extract_text(file_bytes: bytes, timeout_seconds: int = 5) -> str:
    # Use asyncio.wait_for or threading timeout

Effort: 2h


2.4 LLM Call Budget Per Case (MEDIUM)

Location: app/agents/clinical_context.py lines 110-222

Problem: Each document = 3 LLM calls. 10 documents = 30 calls. No limit.

Fix: Add case-level LLM call counter in metadata:

MAX_LLM_CALLS_PER_CASE = 20
if case_meta.get("llm_call_count", 0) >= MAX_LLM_CALLS_PER_CASE:
    return _fallback_response("LLM budget exceeded")

Effort: 2h


2.5 Auto-Reviewer Deduplication (MEDIUM)

Location: app/services/auto_reviewer.py lines 19-153

Problem: Running auto_review_all_cases multiple times creates duplicate feedback records.

Fix: Add last_auto_reviewed_at to cases, skip recently reviewed:

.where(
    Case.ehr_snapshot.isnot(None),
    or_(Case.last_auto_reviewed_at.is_(None),
        Case.last_auto_reviewed_at < func.now() - text("INTERVAL '24 hours'"))
)

Effort: 2h


2.6 JSON Parsing Retry Logic (MEDIUM)

Location: app/agents/llm_conversation.py lines 194-209

Problem: When LLM returns malformed JSON, data is silently lost. No retry.

Fix: Add single retry with simpler prompt. Log all parse failures to Langfuse.

Effort: 2h


2.7 Error Message Sanitization (MEDIUM)

Location: app/routers/match.py lines 67, 78; app/routers/patients.py line 279

Problem: str(e) passed directly to error response — could leak internal details.

Fix:

# Instead of: message=str(e)
# Use: message="An internal error occurred. Please try again."
# Log the actual error server-side only

Effort: 1h


Sprint 3: Test Coverage

3.1 Critical Untested Services

Service Tests Needed Effort
lab_analyzer.py Rule-based detection accuracy, edge cases (borderline values, missing data) 4h
auto_reviewer.py LLM vs rules comparison, feedback record creation, dedup 3h
eval_runner.py Extraction accuracy metrics, match quality NDCG, question relevance 4h
weight_optimizer.py Weight adjustment with mock data, normalization, cap enforcement 3h
message_classifier.py All 8 categories, edge cases, fail-open behavior 2h

3.2 Untested Routers

Router Tests Needed Effort
internal.py QStash callback auth, OCR processing, eval task endpoints 4h
feedback.py All 6 endpoints (decisions, provider-feedback, satisfaction, eval, pending, applied) 3h
procedures.py Requirements lookup, inheritance, provider overrides 2h

3.3 Critical Path E2E Tests

Path Status Tests Needed
File upload → OCR → extraction → EHR → matching 60% covered Full pipeline E2E with mock LLM
Consent → GDPR deletion → cascade cleanup 85% covered Edge cases (partial deletion, concurrent)
Search (PostgreSQL + Qdrant parallel) 60% covered Semantic search, grouped results, caching
Rate limiting 0% covered Per-IP limits, 429 response, header verification
Concurrent workflow state updates 0% covered Two simultaneous chat messages to same case

3.4 Test Quality Improvements

  • Add pytest-timeout to prevent hanging tests
  • Add pytest-randomly to detect order-dependent tests
  • Mock all external services (Neo4j, Qdrant, Claude) consistently
  • Add CI/CD gate: tests must pass before merge

What's Already Strong

Area Coverage Notes
Patient/Provider/Doctor CRUD 90%+ Comprehensive API integration tests
Consent + GDPR 85% Grant, revoke, cascade delete
FHIR resources 85% Create, validate, store, retrieve
Middleware (idempotency, correlation ID) 95% Excellent coverage
Feature flags 90% Flagsmith + YAML fallback tested
Encryption 90% PII encrypt/decrypt roundtrips
EHR deduplication 100% Conditions, observations, comorbidities
Doctor data completeness 100% 7-category scoring, badge tiers
Language concordance 100% 6-tier scoring with native detection
Public storefront API 95% All 9 endpoints with edge cases

Sprint Effort Summary

Sprint Focus Effort Items
Sprint 1 Security hardening ~12h 6 items (SQL injection, prompt injection, guardrails, health auth, CORS, rate limiting)
Sprint 2 Reliability + privacy ~17h 7 items (race conditions, PII redaction, timeouts, budgets, dedup, retry, error sanitization)
Sprint 3 Test coverage ~36h 5 services + 3 routers + 5 E2E paths + quality improvements
Total ~65h