Security Audit & Remediation Plan¶

Audit Date: 2026-04-01 Audited By: Automated agent team (3 parallel audits) Scope: FastAPI routes, LangGraph agent pipeline, test coverage

Executive Summary¶

Area	Risk Level	Critical	High	Medium
Endpoint Security	HIGH	3	2	5
Agent Pipeline	HIGH	2	3	5
Test Coverage	MEDIUM	5 gaps	6 gaps	9 gaps

Sprint 1: Security Hardening (Immediate)¶

1.1 SQL Injection in Public API (CRITICAL)¶

Location: app/routers/public.py lines 311, 315, 319, 544, 547, 1066, 1087, 1106

Problem: User input directly interpolated into ILIKE queries via f-strings.

# VULNERABLE
Provider.name.ilike(f"%{query_text}%")

# FIX
from sqlalchemy import literal
Provider.name.ilike(literal(f"%{query_text}%"))
# Or use bindparam

Affected endpoints: All 9 public endpoints with filtering/search.

Effort: 2h

1.2 Prompt Injection via System Prompts (CRITICAL)¶

Location: app/agents/llm_conversation.py lines 164-167

Problem: patient_context (user-derived data) format-interpolated into system prompt.

# VULNERABLE
system = CURAWAY_SYSTEM_PROMPT.format(
    phase_context=phase_context,
    patient_context=patient_context,  # User data in system prompt
)

# FIX: Use XML delimiters to separate system instructions from user data
system = CURAWAY_SYSTEM_PROMPT + f"\n\n<patient_data>\n{patient_context}\n</patient_data>"

Also affected: case_orchestrator.py line 162 (procedure_name in prompt).

Effort: 3h

1.3 Guardrail Classifier Fails Open (HIGH)¶

Location: app/services/message_classifier.py lines 57-84

Problem: When Anthropic API is down, all messages pass through as "on_topic".

# CURRENT (fails open)
except Exception:
    return {"category": "on_topic", "action": "pass_through"}

# FIX (fail closed)
except Exception:
    return {"category": "unknown", "action": "redirect", "response_key": "service_unavailable"}

Effort: 1h

1.4 Health Endpoint Information Disclosure (HIGH)¶

Location: app/main.py lines 894-904

Problem: /health and /health.json expose database stats, Neo4j info, LLM costs, Qdrant collections — all unauthenticated.

Fix: Require X-Tenant-ID or X-Internal-Secret on /health and /health.json. Keep /ready public (returns only status).

Effort: 1h

1.5 CORS Configuration (HIGH)¶

Location: app/main.py lines 167-181

Problems: - allow_methods=["*"] allows all HTTP methods (TRACE, CONNECT, etc.) - allow_origin_regex=r"https://.*\.vercel\.app" too broad - Localhost origins in production

Fix:

allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"],
allow_origin_regex=r"https://curaway-health-navigator.*\.vercel\.app",
# Remove localhost in production

Effort: 1h

1.6 Rate Limiting (HIGH)¶

Location: All routers — no rate limiting exists.

Fix: Add slowapi middleware:

from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address, storage_uri=settings.upstash_redis_url)

@router.get("/search")
@limiter.limit("60/minute")
async def search(...):

Priority endpoints: /api/v1/public/search, all POST endpoints.

Effort: 4h

Sprint 2: Reliability & Privacy¶

2.1 Workflow State Race Condition (HIGH)¶

Location: app/agents/case_orchestrator.py lines 39-407

Problem: Two concurrent requests to same case can interleave state updates. No locking.

Fix: Add SELECT FOR UPDATE when reading case for orchestrator:

case = await db.execute(
    select(Case).where(Case.id == case_id).with_for_update()
)

Effort: 3h

2.2 PII in LLM Prompts (HIGH)¶

Location: app/agents/llm_conversation.py lines 50, 164

Problem: Patient demographics (name, DOB, country) included in patient_context sent to Claude.

Fix: Redact PII before building patient_context. Use anonymized identifiers:

# Instead of: "Patient: John Smith, DOB 1985-03-15"
# Use: "Patient: [REDACTED], Age: 39"

Effort: 3h

2.3 PyMuPDF Timeout Protection (MEDIUM-HIGH)¶

Location: app/integrations/pymupdf_client.py lines 16-45

Problem: No timeout on fitz.open() — malicious PDF could hang indefinitely.

Fix:

import signal
def extract_text(file_bytes: bytes, timeout_seconds: int = 5) -> str:
    # Use asyncio.wait_for or threading timeout

Effort: 2h

2.4 LLM Call Budget Per Case (MEDIUM)¶

Location: app/agents/clinical_context.py lines 110-222

Problem: Each document = 3 LLM calls. 10 documents = 30 calls. No limit.

Fix: Add case-level LLM call counter in metadata:

MAX_LLM_CALLS_PER_CASE = 20
if case_meta.get("llm_call_count", 0) >= MAX_LLM_CALLS_PER_CASE:
    return _fallback_response("LLM budget exceeded")

Effort: 2h

2.5 Auto-Reviewer Deduplication (MEDIUM)¶

Location: app/services/auto_reviewer.py lines 19-153

Problem: Running auto_review_all_cases multiple times creates duplicate feedback records.

Fix: Add last_auto_reviewed_at to cases, skip recently reviewed:

.where(
    Case.ehr_snapshot.isnot(None),
    or_(Case.last_auto_reviewed_at.is_(None),
        Case.last_auto_reviewed_at < func.now() - text("INTERVAL '24 hours'"))
)

Effort: 2h

2.6 JSON Parsing Retry Logic (MEDIUM)¶

Location: app/agents/llm_conversation.py lines 194-209

Problem: When LLM returns malformed JSON, data is silently lost. No retry.

Fix: Add single retry with simpler prompt. Log all parse failures to Langfuse.

Effort: 2h

2.7 Error Message Sanitization (MEDIUM)¶

Location: app/routers/match.py lines 67, 78; app/routers/patients.py line 279

Problem: str(e) passed directly to error response — could leak internal details.

Fix:

# Instead of: message=str(e)
# Use: message="An internal error occurred. Please try again."
# Log the actual error server-side only

Effort: 1h

Sprint 3: Test Coverage¶

3.1 Critical Untested Services¶

Service	Tests Needed	Effort
`lab_analyzer.py`	Rule-based detection accuracy, edge cases (borderline values, missing data)	4h
`auto_reviewer.py`	LLM vs rules comparison, feedback record creation, dedup	3h
`eval_runner.py`	Extraction accuracy metrics, match quality NDCG, question relevance	4h
`weight_optimizer.py`	Weight adjustment with mock data, normalization, cap enforcement	3h
`message_classifier.py`	All 8 categories, edge cases, fail-open behavior	2h

3.2 Untested Routers¶

Router	Tests Needed	Effort
`internal.py`	QStash callback auth, OCR processing, eval task endpoints	4h
`feedback.py`	All 6 endpoints (decisions, provider-feedback, satisfaction, eval, pending, applied)	3h
`procedures.py`	Requirements lookup, inheritance, provider overrides	2h

3.3 Critical Path E2E Tests¶

Path	Status	Tests Needed
File upload → OCR → extraction → EHR → matching	60% covered	Full pipeline E2E with mock LLM
Consent → GDPR deletion → cascade cleanup	85% covered	Edge cases (partial deletion, concurrent)
Search (PostgreSQL + Qdrant parallel)	60% covered	Semantic search, grouped results, caching
Rate limiting	0% covered	Per-IP limits, 429 response, header verification
Concurrent workflow state updates	0% covered	Two simultaneous chat messages to same case

3.4 Test Quality Improvements¶

Add pytest-timeout to prevent hanging tests
Add pytest-randomly to detect order-dependent tests
Mock all external services (Neo4j, Qdrant, Claude) consistently
Add CI/CD gate: tests must pass before merge

What's Already Strong¶

Area	Coverage	Notes
Patient/Provider/Doctor CRUD	90%+	Comprehensive API integration tests
Consent + GDPR	85%	Grant, revoke, cascade delete
FHIR resources	85%	Create, validate, store, retrieve
Middleware (idempotency, correlation ID)	95%	Excellent coverage
Feature flags	90%	Flagsmith + YAML fallback tested
Encryption	90%	PII encrypt/decrypt roundtrips
EHR deduplication	100%	Conditions, observations, comorbidities
Doctor data completeness	100%	7-category scoring, badge tiers
Language concordance	100%	6-tier scoring with native detection
Public storefront API	95%	All 9 endpoints with edge cases

Sprint Effort Summary¶

Sprint	Focus	Effort	Items
Sprint 1	Security hardening	~12h	6 items (SQL injection, prompt injection, guardrails, health auth, CORS, rate limiting)
Sprint 2	Reliability + privacy	~17h	7 items (race conditions, PII redaction, timeouts, budgets, dedup, retry, error sanitization)
Sprint 3	Test coverage	~36h	5 services + 3 routers + 5 E2E paths + quality improvements
Total		~65h