Security Audit & Remediation Plan¶
Audit Date: 2026-04-01 Audited By: Automated agent team (3 parallel audits) Scope: FastAPI routes, LangGraph agent pipeline, test coverage
Executive Summary¶
| Area | Risk Level | Critical | High | Medium |
|---|---|---|---|---|
| Endpoint Security | HIGH | 3 | 2 | 5 |
| Agent Pipeline | HIGH | 2 | 3 | 5 |
| Test Coverage | MEDIUM | 5 gaps | 6 gaps | 9 gaps |
Sprint 1: Security Hardening (Immediate)¶
1.1 SQL Injection in Public API (CRITICAL)¶
Location: app/routers/public.py lines 311, 315, 319, 544, 547, 1066, 1087, 1106
Problem: User input directly interpolated into ILIKE queries via f-strings.
# VULNERABLE
Provider.name.ilike(f"%{query_text}%")
# FIX
from sqlalchemy import literal
Provider.name.ilike(literal(f"%{query_text}%"))
# Or use bindparam
Affected endpoints: All 9 public endpoints with filtering/search.
Effort: 2h
1.2 Prompt Injection via System Prompts (CRITICAL)¶
Location: app/agents/llm_conversation.py lines 164-167
Problem: patient_context (user-derived data) format-interpolated into system prompt.
# VULNERABLE
system = CURAWAY_SYSTEM_PROMPT.format(
phase_context=phase_context,
patient_context=patient_context, # User data in system prompt
)
# FIX: Use XML delimiters to separate system instructions from user data
system = CURAWAY_SYSTEM_PROMPT + f"\n\n<patient_data>\n{patient_context}\n</patient_data>"
Also affected: case_orchestrator.py line 162 (procedure_name in prompt).
Effort: 3h
1.3 Guardrail Classifier Fails Open (HIGH)¶
Location: app/services/message_classifier.py lines 57-84
Problem: When Anthropic API is down, all messages pass through as "on_topic".
# CURRENT (fails open)
except Exception:
return {"category": "on_topic", "action": "pass_through"}
# FIX (fail closed)
except Exception:
return {"category": "unknown", "action": "redirect", "response_key": "service_unavailable"}
Effort: 1h
1.4 Health Endpoint Information Disclosure (HIGH)¶
Location: app/main.py lines 894-904
Problem: /health and /health.json expose database stats, Neo4j info, LLM costs, Qdrant collections — all unauthenticated.
Fix: Require X-Tenant-ID or X-Internal-Secret on /health and /health.json. Keep /ready public (returns only status).
Effort: 1h
1.5 CORS Configuration (HIGH)¶
Location: app/main.py lines 167-181
Problems:
- allow_methods=["*"] allows all HTTP methods (TRACE, CONNECT, etc.)
- allow_origin_regex=r"https://.*\.vercel\.app" too broad
- Localhost origins in production
Fix:
allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"],
allow_origin_regex=r"https://curaway-health-navigator.*\.vercel\.app",
# Remove localhost in production
Effort: 1h
1.6 Rate Limiting (HIGH)¶
Location: All routers — no rate limiting exists.
Fix: Add slowapi middleware:
from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address, storage_uri=settings.upstash_redis_url)
@router.get("/search")
@limiter.limit("60/minute")
async def search(...):
Priority endpoints: /api/v1/public/search, all POST endpoints.
Effort: 4h
Sprint 2: Reliability & Privacy¶
2.1 Workflow State Race Condition (HIGH)¶
Location: app/agents/case_orchestrator.py lines 39-407
Problem: Two concurrent requests to same case can interleave state updates. No locking.
Fix: Add SELECT FOR UPDATE when reading case for orchestrator:
Effort: 3h
2.2 PII in LLM Prompts (HIGH)¶
Location: app/agents/llm_conversation.py lines 50, 164
Problem: Patient demographics (name, DOB, country) included in patient_context sent to Claude.
Fix: Redact PII before building patient_context. Use anonymized identifiers:
Effort: 3h
2.3 PyMuPDF Timeout Protection (MEDIUM-HIGH)¶
Location: app/integrations/pymupdf_client.py lines 16-45
Problem: No timeout on fitz.open() — malicious PDF could hang indefinitely.
Fix:
import signal
def extract_text(file_bytes: bytes, timeout_seconds: int = 5) -> str:
# Use asyncio.wait_for or threading timeout
Effort: 2h
2.4 LLM Call Budget Per Case (MEDIUM)¶
Location: app/agents/clinical_context.py lines 110-222
Problem: Each document = 3 LLM calls. 10 documents = 30 calls. No limit.
Fix: Add case-level LLM call counter in metadata:
MAX_LLM_CALLS_PER_CASE = 20
if case_meta.get("llm_call_count", 0) >= MAX_LLM_CALLS_PER_CASE:
return _fallback_response("LLM budget exceeded")
Effort: 2h
2.5 Auto-Reviewer Deduplication (MEDIUM)¶
Location: app/services/auto_reviewer.py lines 19-153
Problem: Running auto_review_all_cases multiple times creates duplicate feedback records.
Fix: Add last_auto_reviewed_at to cases, skip recently reviewed:
.where(
Case.ehr_snapshot.isnot(None),
or_(Case.last_auto_reviewed_at.is_(None),
Case.last_auto_reviewed_at < func.now() - text("INTERVAL '24 hours'"))
)
Effort: 2h
2.6 JSON Parsing Retry Logic (MEDIUM)¶
Location: app/agents/llm_conversation.py lines 194-209
Problem: When LLM returns malformed JSON, data is silently lost. No retry.
Fix: Add single retry with simpler prompt. Log all parse failures to Langfuse.
Effort: 2h
2.7 Error Message Sanitization (MEDIUM)¶
Location: app/routers/match.py lines 67, 78; app/routers/patients.py line 279
Problem: str(e) passed directly to error response — could leak internal details.
Fix:
# Instead of: message=str(e)
# Use: message="An internal error occurred. Please try again."
# Log the actual error server-side only
Effort: 1h
Sprint 3: Test Coverage¶
3.1 Critical Untested Services¶
| Service | Tests Needed | Effort |
|---|---|---|
lab_analyzer.py |
Rule-based detection accuracy, edge cases (borderline values, missing data) | 4h |
auto_reviewer.py |
LLM vs rules comparison, feedback record creation, dedup | 3h |
eval_runner.py |
Extraction accuracy metrics, match quality NDCG, question relevance | 4h |
weight_optimizer.py |
Weight adjustment with mock data, normalization, cap enforcement | 3h |
message_classifier.py |
All 8 categories, edge cases, fail-open behavior | 2h |
3.2 Untested Routers¶
| Router | Tests Needed | Effort |
|---|---|---|
internal.py |
QStash callback auth, OCR processing, eval task endpoints | 4h |
feedback.py |
All 6 endpoints (decisions, provider-feedback, satisfaction, eval, pending, applied) | 3h |
procedures.py |
Requirements lookup, inheritance, provider overrides | 2h |
3.3 Critical Path E2E Tests¶
| Path | Status | Tests Needed |
|---|---|---|
| File upload → OCR → extraction → EHR → matching | 60% covered | Full pipeline E2E with mock LLM |
| Consent → GDPR deletion → cascade cleanup | 85% covered | Edge cases (partial deletion, concurrent) |
| Search (PostgreSQL + Qdrant parallel) | 60% covered | Semantic search, grouped results, caching |
| Rate limiting | 0% covered | Per-IP limits, 429 response, header verification |
| Concurrent workflow state updates | 0% covered | Two simultaneous chat messages to same case |
3.4 Test Quality Improvements¶
- Add
pytest-timeoutto prevent hanging tests - Add
pytest-randomlyto detect order-dependent tests - Mock all external services (Neo4j, Qdrant, Claude) consistently
- Add CI/CD gate: tests must pass before merge
What's Already Strong¶
| Area | Coverage | Notes |
|---|---|---|
| Patient/Provider/Doctor CRUD | 90%+ | Comprehensive API integration tests |
| Consent + GDPR | 85% | Grant, revoke, cascade delete |
| FHIR resources | 85% | Create, validate, store, retrieve |
| Middleware (idempotency, correlation ID) | 95% | Excellent coverage |
| Feature flags | 90% | Flagsmith + YAML fallback tested |
| Encryption | 90% | PII encrypt/decrypt roundtrips |
| EHR deduplication | 100% | Conditions, observations, comorbidities |
| Doctor data completeness | 100% | 7-category scoring, badge tiers |
| Language concordance | 100% | 6-tier scoring with native detection |
| Public storefront API | 95% | All 9 endpoints with edge cases |
Sprint Effort Summary¶
| Sprint | Focus | Effort | Items |
|---|---|---|---|
| Sprint 1 | Security hardening | ~12h | 6 items (SQL injection, prompt injection, guardrails, health auth, CORS, rate limiting) |
| Sprint 2 | Reliability + privacy | ~17h | 7 items (race conditions, PII redaction, timeouts, budgets, dedup, retry, error sanitization) |
| Sprint 3 | Test coverage | ~36h | 5 services + 3 routers + 5 E2E paths + quality improvements |
| Total | ~65h |