ADR-0019: GDPR Article 17 — Right to Erasure Cascade¶
Status: Accepted — Implementation Required Date: 2026-04-13 Session: 40
Context¶
Curaway processes sensitive patient data (PII + PHI) across multiple storage layers: PostgreSQL, Cloudflare R2, Upstash Redis, and potentially Neo4j/Qdrant in future phases. GDPR Article 17 ("right to erasure") requires that upon a valid data subject request, all personal data is deleted or anonymized within 30 days, across every store.
The current implementation (data_subject_handler.py) covers 9 of 15 patient-linked data sources. An audit revealed 6 PostgreSQL tables and 1 cache layer with patient data that are not included in the cascade. This is a compliance-blocking gap.
What's at stake¶
- Regulatory: GDPR fines up to €20M or 4% annual turnover for incomplete erasure
- Trust: Patients are sharing medical records, conversation history, and treatment preferences. Incomplete deletion erodes the core trust proposition.
- Cross-border: Curaway operates across EU/UK/UAE jurisdictions — all with data protection laws that mirror or exceed GDPR requirements
Decision¶
Complete Erasure Cascade¶
The deletion handler will cascade across all patient-linked data in this order (respecting foreign key dependencies):
| Step | Table / Store | Strategy | Justification |
|---|---|---|---|
| 1 | messages (via conversation_id) |
Hard delete | Contains patient's natural language medical disclosures, intake answers, treatment preferences. Most sensitive text data. |
| 2 | conversations (via patient_id) |
Hard delete | Metadata links to deleted messages. No value without messages. |
| 3 | feedback_records (via case_id) |
Hard delete | Contains clinical corrections with ground truth medical data. |
| 4 | match_results (via patient_id) |
Hard delete | Reveals patient's medical conditions, preferences, provider selections. |
| 5 | device_registrations (via user_id) |
Hard delete | Push tokens and device fingerprints are PII under GDPR. |
| 6 | cases (via patient_id) |
Anonymize | Retain shell for audit trail (case_number, created_at, status). Wipe: ehr_snapshot, workflow_state, selected_providers, comorbidities, extra_metadata, procedure_name. Set patient_id = 'DELETED'. |
| 7 | fhir_resources (via patient_id) |
Hard delete | Medical data — no retention basis after erasure. (existing) |
| 8 | document_references (via patient_id) |
Mark deleted + queue R2 cleanup | Retain shell for audit. Wipe: ocr_text, extracted_data. (existing) |
| 9 | consent_records (via patient_id) |
Anonymize | Retain for compliance proof (purpose, version, granted_at). Wipe: ip_address, user_agent. (existing) |
| 10 | data_forwarding_audit (via patient_id) |
Anonymize | Retain audit trail. Replace patient_id with 'DELETED'. (existing) |
| 11 | consultations (via patient_id) |
Anonymize | Wipe notes. (existing) |
| 12 | notifications (via patient_id) |
Hard delete | No retention value. (existing) |
| 13 | events (via patient_id) |
Anonymize | Replace patient_id/actor_id with 'DELETED'. Retain event_type for analytics. (existing) |
| 14 | patients |
PII wipe (soft delete) | Retain shell (id, tenant_id, is_deleted). Wipe all PII fields. (existing) |
| 15 | Cloudflare R2 | Hard delete files | Binary medical documents (PDFs, images, X-rays). (existing) |
| 16 | Upstash Redis | Flush patient cache keys | Transient but must be invalidated immediately on deletion. Pattern: chat:state:{tid}:{cid}, chat:conv:{tid}:{cid}, chat:fhir:{tid}:{cid}, chat:checklist:{tid}:{cid} |
| 17 | audit_log |
Write-only (never deleted) | Append deletion certificate. Legal basis: legitimate interest in compliance record-keeping. (existing, by design) |
Deletion Strategy Rationale¶
Hard delete when: - Data has no compliance retention purpose (messages, match results, feedback, devices, FHIR) - Anonymization would leave meaningless shells
Anonymize when: - Compliance requires proof of process (consent records, forwarding audits, events) - Audit trail must survive deletion (cases — case_number needed for provider-side reconciliation)
Soft delete + PII wipe when: - Foreign keys from other tables reference the record (patients table)
Case Anonymization Detail¶
Cases require special handling because: 1. Provider-side snapshots reference the Curaway case number (CRW-2026-XXXXX) 2. Financial records may reference case_id 3. SLA monitoring references case lifecycle timestamps
Retained: id, tenant_id, case_number, status, created_at, updated_at
Wiped: patient_id → 'DELETED', ehr_snapshot → {}, workflow_state → {}, selected_providers → [], comorbidities → [], extra_metadata → {}, procedure_name → NULL, procedure_code → NULL
Redis Cache Invalidation¶
Redis keys use deterministic patterns based on tenant_id + case_id. The handler will:
1. Query all case_id values for the patient before anonymizing cases
2. Issue DEL for each key pattern per case
3. Best-effort — cache has 60-300s TTL anyway, but immediate invalidation prevents serving stale data during the deletion window
R2 File Cleanup¶
Current behavior (retained):
- R2 deletion happens after DB flush succeeds
- If DB flush fails, files stay (correct — DB is source of truth)
- Failed R2 deletes are logged in the certificate for manual follow-up
- Runs via asyncio.to_thread to avoid blocking the event loop
External Stores — Future Phases¶
| Store | Patient Data? | Action | When |
|---|---|---|---|
| Neo4j | Not currently (provider/procedure graph only) | Add cascade when patient nodes introduced | Phase 0 (multi-tenancy) |
| Qdrant | Not currently (provider embeddings only) | Add cascade if patient-specific vectors added | If/when needed |
| Langfuse | Trace data with patient context | API delete or retention policy | Phase 3 |
| PostHog | Analytics events with patient_id | Delete user API call | Phase 3 |
Deletion Certificate¶
The certificate (returned to caller and stored in audit_log) will include counts for all 16 data categories:
{
"deletion_id": "uuid",
"patient_id": "patient-xxx",
"tenant_id": "tenant-xxx",
"requested_by": "patient",
"requested_at": "2026-04-13T...",
"completed_at": "2026-04-13T...",
"status": "completed",
"records_affected": {
"messages_deleted": 142,
"conversations_deleted": 3,
"feedback_records_deleted": 5,
"match_results_deleted": 2,
"device_registrations_deleted": 1,
"cases_anonymized": 3,
"patient_pii_wiped": 1,
"fhir_resources_deleted": 28,
"fhir_resources_invalidated": 4,
"documents_marked_deleted": 12,
"consent_records_anonymized": 6,
"forwarding_audits_anonymized": 2,
"consultations_anonymized": 1,
"notifications_deleted": 15,
"r2_files_deleted": 12,
"redis_keys_flushed": 12
},
"r2_keys_failed_cleanup": []
}
Idempotency¶
Running the handler twice for the same patient must be safe:
- Patient already is_deleted = True → skip PII wipe, continue cascade (catch newly-created records)
- Hard deletes are naturally idempotent (DELETE WHERE ... returns 0 rows)
- Anonymize operations are idempotent (already 'DELETED' values stay 'DELETED')
API Endpoint¶
Existing: DELETE /api/v1/patients/{patient_id}/data
No change to the API contract. The response body gains additional keys in records_affected but this is backwards-compatible.
Access Control¶
- Patient: Can request own erasure (authenticated via Clerk JWT, patient_id from token)
- Platform Admin / Super Admin: Can execute erasure on behalf of patient (audit_log records
requested_by) - No other actor type can trigger erasure
Timing¶
GDPR requires completion within 30 days. Our target: immediate (synchronous within the HTTP request). If R2 cleanup fails, the certificate records failures for manual follow-up within the 30-day window.
Consequences¶
Positive: - Full GDPR Article 17 compliance across all patient data stores - Deletion certificate provides auditable proof of compliance - Idempotent design prevents partial-deletion edge cases - Clear retention rationale per table (hard delete vs anonymize)
Negative: - Cascade touches 16 data sources in a single transaction — long-running for patients with extensive history - Case anonymization means provider-side snapshots reference a case with no patient data (by design — provider snapshots are copies, not references) - Redis flush is best-effort — failure is non-blocking but logged
Risks:
- If new tables with patient data are added without updating the handler, they become compliance gaps. Mitigation: the compliance-reviewer agent audits for patient_id columns not covered by the handler.
- R2 eventual consistency means deleted files may be briefly accessible after deletion. Mitigation: is_deleted = True in DB prevents application-layer access immediately.
Implementation¶
Files to modify¶
app/services/data_subject_handler.py— add steps 1-6 and 16app/models/case.py— ensure all JSONB fields are nullable for anonymizationtests/test_gdpr_deletion.py— add tests for new cascade steps
Estimated effort¶
- 1-2 days implementation + testing
References¶
- GDPR Article 17: Right to erasure ("right to be forgotten")
- GDPR Article 7(3): Withdrawal of consent
- Current handler:
app/services/data_subject_handler.py - Auth/Privacy architecture:
docs/architecture/09-auth-privacy.md - Gap report item #4:
docs/specs/gap-report.md - Multi-tenancy ADR: ADR-0018