Sequence — Auth + Tenant Resolution¶
Every authenticated request to the Curaway API passes through RBACMiddleware. This middleware verifies the Clerk JWT, resolves the tenant context, populates request.state with roles + permissions, and (for portal users) JIT-syncs Clerk org-role claims into the user_roles table.
This is the hottest path in the system — runs on every request — and the most critical for migration since GCP Cloud Run cold starts and IAM round-trips will land here first.
Audience: Engineering team + security review. Companion to ADR-0018 (multi-tenancy) and ADR-0021 (Clerk org-role mapping).
Sequence¶
sequenceDiagram
autonumber
actor User as User<br/>(patient or portal)
participant FE as Frontend<br/>(Vercel)
participant API as Curaway API
participant MW as RBACMiddleware
participant Redis as Upstash Redis<br/>(5-min TTL cache)
participant PG as PostgreSQL
participant Sync as clerk_role_sync
participant FS as Flagsmith<br/>(runtime flags)
participant Telegram as Telegram<br/>(alerts)
User->>FE: action requiring auth
FE->>API: GET /api/v1/cases<br/>Authorization: Bearer <Clerk JWT>
API->>MW: dispatch(request)
MW->>MW: skip if path is /health, /docs, /api/v1/public/*
MW->>MW: _extract_user_id(JWT)<br/>(JWT verified via Clerk public key)
MW->>MW: _extract_org_context(JWT)<br/>(org_id, org_role)
alt portal user (org_id present)
MW->>Redis: GET org_tenant:{org_id}
alt cache hit
Redis-->>MW: { tenant_id, portal_type }
else cache miss
MW->>PG: SELECT tenant_id, org_role<br/>FROM tenant_org_mappings<br/>WHERE clerk_org_id = ?
PG-->>MW: row
MW->>Redis: SET (TTL 300s)
end
opt header was also sent
MW->>MW: assert X-Tenant-ID == org-resolved<br/>(spoofing guard)
end
opt JIT sync (clerk_role_auto_assign_enabled = on)
MW->>FS: is_feature_enabled("clerk_role_auto_assign_enabled")
FS-->>MW: true
MW->>Sync: reconcile(user, tenant, jwt_role, org_id, portal_type)
Sync->>Sync: lookup_key = portal_type + ":" + jwt_role<br/>(strip "org:" prefix)
alt key in clerk_role_mapping.yaml
Sync->>PG: SELECT existing user_roles<br/>where granted_by='system:clerk_jwt'
alt drift (existing role != desired)
Sync->>PG: UPDATE old row → is_active=false
Sync->>PG: INSERT new auto row
else no row
Sync->>PG: INSERT auto row
else manual row exists
Note over Sync,PG: manual rows preserved (sticky)
else null mapping (silent skip)
Note over Sync: e.g. patient personal org
end
else key not in mapping
Sync--xTelegram: WARNING alert<br/>("Unknown Clerk org context")
end
end
else patient (no org_id)
MW->>MW: tenant_id = X-Tenant-ID header
end
MW->>PG: SELECT roles + permissions<br/>FROM user_roles JOIN roles<br/>WHERE user_id = ? AND tenant_id = ?
PG-->>MW: roles + permissions
MW->>MW: set request.state.{user_id, tenant_id, roles, permissions}
MW->>API: call_next(request)
Note over API: route handler runs<br/>@require_permission decorator<br/>checks request.state.permissions
API-->>FE: response
Migration callouts¶
| Concern | Today | GCP target | Notes |
|---|---|---|---|
| JWT verify | Clerk SDK (network call to fetch JWKS, then local verify) | Unchanged | Verify Cloud Run cold-start time on first JWKS fetch — could add 50-150ms latency tail to first request. Pre-warm or extend keep-alive. |
| Tenant cache | Upstash Redis | Memorystore | 5-min TTL keeps the DB lookup off the hot path. Memorystore latency from Cloud Run is ~1ms intra-region. |
| Tenant DB lookup | Cloud SQL Postgres tenant_org_mappings |
Unchanged | Index on clerk_org_id already present. |
| In-memory fallback dict | Hardcoded org IDs in middleware | Remove post-migration | The fallback is only reached when both Redis AND Cloud SQL are down; on GCP that's a region-wide outage where falling through is unsafe. Make this a hard-fail. |
| JIT sync trigger | Flagsmith flag check | Unchanged | Flagsmith Cloud SaaS — FLAGSMITH_SERVER_KEY env var. |
| YAML mapping load | config/clerk_role_mapping.yaml read at module import |
Unchanged | Bundled with deploy artifact. |
| Telegram alert (unknown role) | HTTP POST to Telegram bot API | Unchanged | Replace with Cloud Logging + Pub/Sub-driven Telegram dispatcher if Cloud Run egress to Telegram is restricted by org policy. |
| Permission lookup | Cloud SQL JOIN on user_roles × roles |
Unchanged | Two tables, both small (< 10K rows). Cache-friendly. |
Critical migration risks¶
-
JWKS warm-cache on cold start. Clerk JWT verification fetches the JWKS once per process. Cloud Run cold start = first request waits for JWKS fetch (~50-150ms over public internet). Mitigation: enable Cloud Run min-instances ≥ 1 on the API service; the cost (~$5-10/mo for a small instance) is worth it for p99 latency.
-
Redis cache stampede on deploy. When all Cloud Run instances start fresh, every authenticated request from a portal user triggers a Redis miss → DB query in parallel. The current 5-min TTL means under a deploy storm, every org gets queried at least once. Mitigation: pre-warm the cache during health-check init.
-
Unbounded JIT failure mode. If
clerk_role_sync.reconcile()raises an unhandled exception, the middleware swallows it (broadexcept) and returns the user with empty permissions. The user then sees 403 everywhere. No alert fires for this case. Migration prep: replace the broadexceptwith specific handlers forLookupError,OperationalError,IntegrityError; route everything else to a CRITICAL alert. -
Tenant ID spoofing via header. When both Clerk org and
X-Tenant-IDheader are present, the middleware logs a warning but uses the org-derived value. Verify on Cloud Run that headers can't be injected by an upstream proxy bypassing this guard (e.g. Cloud CDN, Cloud Armor). Document the trust boundary. -
tenant_org_mappingsis the new SPOF. Pre-migration, the in-memory fallback dict masked DB outages. Post-migration (when we recommend removing the dict), Cloud SQL availability becomes the auth path's hard floor. Argument for Cloud SQL with regional replica + automatic failover.
Code references¶
- Middleware:
app/middleware/rbac_middleware.py - JIT sync:
app/services/clerk_role_sync.py - Compound role mapping:
config/clerk_role_mapping.yaml - ADRs: ADR-0018 (multi-tenancy platform architecture), ADR-0021 (Clerk org model)