Skip to content

ADR-0019: GDPR Article 17 — Right to Erasure Cascade

Status: Accepted — Implementation Required Date: 2026-04-13 Session: 40

Context

Curaway processes sensitive patient data (PII + PHI) across multiple storage layers: PostgreSQL, Cloudflare R2, Upstash Redis, and potentially Neo4j/Qdrant in future phases. GDPR Article 17 ("right to erasure") requires that upon a valid data subject request, all personal data is deleted or anonymized within 30 days, across every store.

The current implementation (data_subject_handler.py) covers 9 of 15 patient-linked data sources. An audit revealed 6 PostgreSQL tables and 1 cache layer with patient data that are not included in the cascade. This is a compliance-blocking gap.

What's at stake

  • Regulatory: GDPR fines up to €20M or 4% annual turnover for incomplete erasure
  • Trust: Patients are sharing medical records, conversation history, and treatment preferences. Incomplete deletion erodes the core trust proposition.
  • Cross-border: Curaway operates across EU/UK/UAE jurisdictions — all with data protection laws that mirror or exceed GDPR requirements

Decision

Complete Erasure Cascade

The deletion handler will cascade across all patient-linked data in this order (respecting foreign key dependencies):

Step Table / Store Strategy Justification
1 messages (via conversation_id) Hard delete Contains patient's natural language medical disclosures, intake answers, treatment preferences. Most sensitive text data.
2 conversations (via patient_id) Hard delete Metadata links to deleted messages. No value without messages.
3 feedback_records (via case_id) Hard delete Contains clinical corrections with ground truth medical data.
4 match_results (via patient_id) Hard delete Reveals patient's medical conditions, preferences, provider selections.
5 device_registrations (via user_id) Hard delete Push tokens and device fingerprints are PII under GDPR.
6 cases (via patient_id) Anonymize Retain shell for audit trail (case_number, created_at, status). Wipe: ehr_snapshot, workflow_state, selected_providers, comorbidities, extra_metadata, procedure_name. Set patient_id = 'DELETED'.
7 fhir_resources (via patient_id) Hard delete Medical data — no retention basis after erasure. (existing)
8 document_references (via patient_id) Mark deleted + queue R2 cleanup Retain shell for audit. Wipe: ocr_text, extracted_data. (existing)
9 consent_records (via patient_id) Anonymize Retain for compliance proof (purpose, version, granted_at). Wipe: ip_address, user_agent. (existing)
10 data_forwarding_audit (via patient_id) Anonymize Retain audit trail. Replace patient_id with 'DELETED'. (existing)
11 consultations (via patient_id) Anonymize Wipe notes. (existing)
12 notifications (via patient_id) Hard delete No retention value. (existing)
13 events (via patient_id) Anonymize Replace patient_id/actor_id with 'DELETED'. Retain event_type for analytics. (existing)
14 patients PII wipe (soft delete) Retain shell (id, tenant_id, is_deleted). Wipe all PII fields. (existing)
15 Cloudflare R2 Hard delete files Binary medical documents (PDFs, images, X-rays). (existing)
16 Upstash Redis Flush patient cache keys Transient but must be invalidated immediately on deletion. Pattern: chat:state:{tid}:{cid}, chat:conv:{tid}:{cid}, chat:fhir:{tid}:{cid}, chat:checklist:{tid}:{cid}
17 audit_log Write-only (never deleted) Append deletion certificate. Legal basis: legitimate interest in compliance record-keeping. (existing, by design)

Deletion Strategy Rationale

Hard delete when: - Data has no compliance retention purpose (messages, match results, feedback, devices, FHIR) - Anonymization would leave meaningless shells

Anonymize when: - Compliance requires proof of process (consent records, forwarding audits, events) - Audit trail must survive deletion (cases — case_number needed for provider-side reconciliation)

Soft delete + PII wipe when: - Foreign keys from other tables reference the record (patients table)

Case Anonymization Detail

Cases require special handling because: 1. Provider-side snapshots reference the Curaway case number (CRW-2026-XXXXX) 2. Financial records may reference case_id 3. SLA monitoring references case lifecycle timestamps

Retained: id, tenant_id, case_number, status, created_at, updated_at Wiped: patient_id → 'DELETED', ehr_snapshot{}, workflow_state{}, selected_providers[], comorbidities[], extra_metadata{}, procedure_name → NULL, procedure_code → NULL

Redis Cache Invalidation

Redis keys use deterministic patterns based on tenant_id + case_id. The handler will: 1. Query all case_id values for the patient before anonymizing cases 2. Issue DEL for each key pattern per case 3. Best-effort — cache has 60-300s TTL anyway, but immediate invalidation prevents serving stale data during the deletion window

R2 File Cleanup

Current behavior (retained): - R2 deletion happens after DB flush succeeds - If DB flush fails, files stay (correct — DB is source of truth) - Failed R2 deletes are logged in the certificate for manual follow-up - Runs via asyncio.to_thread to avoid blocking the event loop

External Stores — Future Phases

Store Patient Data? Action When
Neo4j Not currently (provider/procedure graph only) Add cascade when patient nodes introduced Phase 0 (multi-tenancy)
Qdrant Not currently (provider embeddings only) Add cascade if patient-specific vectors added If/when needed
Langfuse Trace data with patient context API delete or retention policy Phase 3
PostHog Analytics events with patient_id Delete user API call Phase 3

Deletion Certificate

The certificate (returned to caller and stored in audit_log) will include counts for all 16 data categories:

{
  "deletion_id": "uuid",
  "patient_id": "patient-xxx",
  "tenant_id": "tenant-xxx",
  "requested_by": "patient",
  "requested_at": "2026-04-13T...",
  "completed_at": "2026-04-13T...",
  "status": "completed",
  "records_affected": {
    "messages_deleted": 142,
    "conversations_deleted": 3,
    "feedback_records_deleted": 5,
    "match_results_deleted": 2,
    "device_registrations_deleted": 1,
    "cases_anonymized": 3,
    "patient_pii_wiped": 1,
    "fhir_resources_deleted": 28,
    "fhir_resources_invalidated": 4,
    "documents_marked_deleted": 12,
    "consent_records_anonymized": 6,
    "forwarding_audits_anonymized": 2,
    "consultations_anonymized": 1,
    "notifications_deleted": 15,
    "r2_files_deleted": 12,
    "redis_keys_flushed": 12
  },
  "r2_keys_failed_cleanup": []
}

Idempotency

Running the handler twice for the same patient must be safe: - Patient already is_deleted = True → skip PII wipe, continue cascade (catch newly-created records) - Hard deletes are naturally idempotent (DELETE WHERE ... returns 0 rows) - Anonymize operations are idempotent (already 'DELETED' values stay 'DELETED')

API Endpoint

Existing: DELETE /api/v1/patients/{patient_id}/data

No change to the API contract. The response body gains additional keys in records_affected but this is backwards-compatible.

Access Control

  • Patient: Can request own erasure (authenticated via Clerk JWT, patient_id from token)
  • Platform Admin / Super Admin: Can execute erasure on behalf of patient (audit_log records requested_by)
  • No other actor type can trigger erasure

Timing

GDPR requires completion within 30 days. Our target: immediate (synchronous within the HTTP request). If R2 cleanup fails, the certificate records failures for manual follow-up within the 30-day window.

Consequences

Positive: - Full GDPR Article 17 compliance across all patient data stores - Deletion certificate provides auditable proof of compliance - Idempotent design prevents partial-deletion edge cases - Clear retention rationale per table (hard delete vs anonymize)

Negative: - Cascade touches 16 data sources in a single transaction — long-running for patients with extensive history - Case anonymization means provider-side snapshots reference a case with no patient data (by design — provider snapshots are copies, not references) - Redis flush is best-effort — failure is non-blocking but logged

Risks: - If new tables with patient data are added without updating the handler, they become compliance gaps. Mitigation: the compliance-reviewer agent audits for patient_id columns not covered by the handler. - R2 eventual consistency means deleted files may be briefly accessible after deletion. Mitigation: is_deleted = True in DB prevents application-layer access immediately.

Implementation

Files to modify

  • app/services/data_subject_handler.py — add steps 1-6 and 16
  • app/models/case.py — ensure all JSONB fields are nullable for anonymization
  • tests/test_gdpr_deletion.py — add tests for new cascade steps

Estimated effort

  • 1-2 days implementation + testing

References

  • GDPR Article 17: Right to erasure ("right to be forgotten")
  • GDPR Article 7(3): Withdrawal of consent
  • Current handler: app/services/data_subject_handler.py
  • Auth/Privacy architecture: docs/architecture/09-auth-privacy.md
  • Gap report item #4: docs/specs/gap-report.md
  • Multi-tenancy ADR: ADR-0018