MSO Video — Daily.co Operations Runbook¶
Phase 2a.2 (ADR-0018) ships the Daily.co integration for MSO
teleconsultation. This runbook is the operational reference for
app/services/video_room_service.py + app/services/video_room_pii_filter.py
+ the QStash cron handlers in app/jobs/.
Related: ADR-0018 §G, ADR-0025, docs/specs/mso-teleconsultation-feature.md.
1. Daily.co API key rotation¶
Keys live in Railway as DAILY_API_KEY. To rotate:
- Log in to the Daily.co dashboard → Developers → API keys.
- Generate a new key. Do NOT delete the old key yet.
- Update
DAILY_API_KEYin Railway (railway variables set ...) on thebackendservice. Railway redeploys automatically. - Smoke check after deploy: hit
/api/v1/admin/health/external-depsand confirmdaily_co.last_check_ok=true. Or run the curl command in §4. - Delete the old key from the Daily.co dashboard.
If rotation breaks production: roll the env var back; Railway redeploys in ~60s.
2. Verifying recording is disabled¶
ADR-0025 commits to "no recording, ever". Three layers enforce this:
enable_recording=falseis set on everyPOST /roomscall.- The round-trip
config.enable_recordingfield is asserted inVideoRoomService.create_room. A non-false value triggers an immediatedestroy_roomand raisesVideoRoomProvisionFailedError. teleconsultation_sessions.recording_urlhas aCHECK (... IS NULL)constraint at the DB level (Phase 2a.1).
Manual verification:
- Daily.co dashboard → Rooms → select the room → Configuration
→ confirm "Recording" is
disabled. - Or call
assert_recording_disabled(daily_room_id)from a tenant-scoped admin shell. - Phase 2a.8 ships a CI smoke test that asserts this on every main-branch build.
If a room is ever found with recording enabled:
- Destroy it immediately (
destroy_room). - File P0 incident — this is an ADR-0025 violation.
- Audit Daily.co console for any other rooms; rotate the API key.
3. HIPAA tier upgrade path¶
The MVP runs on Daily.co's non-HIPAA tier. The runtime check warns in production once a tenant has patient rows (Gate 3). Upgrade flow:
- Phase 2a.8 smoke is green AND SD has completed one internal test session end-to-end.
- Subscribe to Daily.co's HIPAA tier (BAA execution required).
- Flip Flagsmith flag
daily_hipaa_tier_enabled→true. - Flip Flagsmith flag
mso_post_launch→trueimmediately after. - Confirm
VideoRoomService.create_roomlogs no warnings on the next booking (the WARN-only path is now bypassed).
If mso_post_launch=true is flipped before daily_hipaa_tier_enabled=true
the next booking raises VideoHipaaTierRequiredError (503
VIDEO_HIPAA_TIER_REQUIRED_001) — a CRITICAL Telegram alert fires.
Resolution: flip daily_hipaa_tier_enabled=true or revert
mso_post_launch=false.
4. Smoke test (post-deploy)¶
Run from a shell with DAILY_API_KEY exported (use a sandbox key in
non-prod). All commands assume mso_video_enabled=true is flipped for
the calling tenant in Flagsmith.
# 1. Create a test room (consultation_id picked at random)
CID="00000000-0000-4000-8000-000000000001"
curl -X POST https://api.curaway.ai/api/v1/internal/_test/video/create \
-H "X-Tenant-ID: tenant-mso-panel" \
-H "Authorization: Bearer $INTERNAL_API_SECRET" \
-d "{\"consultation_id\": \"$CID\", \"scheduled_for\": \"2026-12-31T10:00:00Z\"}"
# Expected: 200 with {daily_room_id: "consult-00000000", daily_meeting_url: ...}
# 2. Confirm recording is disabled
curl https://api.curaway.ai/api/v1/internal/_test/video/assert-recording-disabled?room=consult-00000000 \
-H "Authorization: Bearer $INTERNAL_API_SECRET"
# Expected: 200 with {ok: true}
# 3. Destroy the room
curl -X DELETE https://api.curaway.ai/api/v1/internal/_test/video/consult-00000000 \
-H "Authorization: Bearer $INTERNAL_API_SECRET"
# Expected: 200 (or 404 if already destroyed — both are success)
The internal test endpoints above are gated by INTERNAL_API_SECRET
and ship in Phase 2a.8. Until then, run the smoke from a Python REPL
attached to the Railway shell.
5. Telegram alert recipients¶
daily_co_outage— fires on persistent Daily.co API failure. Routes to the standard alert chat (TELEGRAM_ALERT_CHAT_ID). Severity: WARNING.daily_hipaa_tier_required— fires when a post-launch tenant tries to provision without HIPAA tier. Severity: CRITICAL.
If alerts go silent: confirm TELEGRAM_BOT_TOKEN + TELEGRAM_ALERT_CHAT_ID
are set on Railway and that the Telegram bot is still in the chat.
6. Circuit breaker¶
VideoRoomService opens a module-level breaker after 5 failures in 60s
(open for 30s). Manual reset (in a Python REPL):
This drops the singleton and resets the breaker. Useful for clearing state after a transient Daily.co outage.
7. PII boundary violations¶
app/services/video_room_pii_filter.py raises
VideoRoomPIIBoundaryViolation when a Daily.co payload would leak PII.
This is a hard fail. If you see one in logs:
- Check the violation list in the exception detail.
- Identify the calling site — likely a new field added to a Daily.co payload that contains a name/email/etc.
- Fix the call site to use the role labels (
Doctor/Patient/Companion) andconsult-XXXXXXXXroom name format. - Add a regression test in
tests/test_video_room_pii_filter.py.
Pre-launch Rollout Checklist (Phase 2a.8 → MSO Video Live)¶
Run these steps in order. Do not skip or reorder.
- [ ] Verify
mso_video_enabledisfalsein production Flagsmith - [ ] Verify
mso_post_launchisfalsein production Flagsmith - [ ] Deploy this PR to production. Migration head should be
d6e7f8a9b1c2(no new migrations from 2a.8 — wiring + tests only) - [ ] Register QStash cron schedules (idempotent — safe to re-run):
Or via admin API:
# From Railway shell or locally with production env vars: python -m app.register_schedules # Confirms: mso-room-provision (*/5) + mso-room-destroy (*/5) registeredPOST /api/v1/internal/schedules(requires X-Internal-Secret) - [ ] Flip
mso_lifecycle_cron_enabledtotruein dev environment first; verify cron fires within 5 min (Railway logs:mso-room-provision: enabled=True) - [ ] Flip
mso_lifecycle_cron_enabledtotruein production - [ ] Run main-smoke workflow manually (
workflow_dispatchon main-smoke.yml); verify recording-disabled assertion passes on the run output - [ ] Internal test session (SD as patient OR doctor):
a. Schedule a session against a test MSO doctor
b. Verify cron provisions the room T-15min before
(watch Railway logs for
mso.session.room_provisioned) c. Join from both sides; hold for 1+ minute; end the session d. Verify charge state in test Stripe / Razorpay dashboard e. Confirmrecording_url IS NULLon the session row: - [ ] HIPAA tier upgrade (separate decision, ~$500/mo):
When SD is ready, upgrade Daily.co plan to HIPAA-covered tier;
flip
daily_hipaa_tier_enabledtotruein production. Until then, the runtime check is WARN-only (not blocking). - [ ] Flip
mso_post_launchtotrue(escalates the runtime HIPAA check from WARN to CRITICAL+block) - [ ] Flip
mso_video_enabledtotruein production. MSO video is now live.
Rollback: set mso_video_enabled=false in Flagsmith — instant.
Sessions in-flight are unaffected (their room URLs still work until
the cron destroys them at 90min or ended_at+60s).