Flagsmith Flag Rollback Runbook¶

Owner: Platform Ops Last updated: 2026-05-22 Related ADRs: ADR-0018 §F (multi-tenancy + feature flag governance) Related auxiliary memory: reference_flagsmith_v2_env_patch.md, reference_flagsmith_v2_value_patch.md, feedback_flagsmith_dual_env.md

Overview¶

Curaway uses Flagsmith for runtime feature flagging with V2 environment versioning enabled. Every flag exists in two environments — Production and Development — and rollback always flips both, never one. This runbook covers:

The general procedure for flipping any V2-versioned flag back to OFF
Per-flag rollback semantics for flags with known asymmetric side effects, starting with auto_invoke_matcher_on_intake_complete

If a flag is not listed in the per-flag section below, follow the general procedure and assume symmetric semantics (flipping OFF reverts behavior cleanly for all subsequent traffic).

Curaway Flagsmith environment reference¶

Environment	env_id (int)	env api_key
Production	`85219`	`X4CdBvak98wpn6Ljq7eUSs`
Development	`85220`	`SCE375zGzViFpGZhWoiK7D`

Project ID: 36214
Admin token: Railway env var FLAGSMITH_ADMIN_TOKEN (Production project). Local: pull from Railway with railway variables --service backend | grep FLAGSMITH.
API base: https://api.flagsmith.com

Both envs are V2-versioned (EnvironmentFeatureVersion rows present on every featurestate). The legacy unscoped PATCH endpoint /api/v1/features/featurestates/{id}/ returns HTTP 400 with the message "This environment uses v2 feature versioning. Use the environment feature version endpoint instead." — do not use it.

General procedure — flip a V2-versioned flag back to OFF¶

Step 1 — Confirm with SD before flipping¶

Flag flips are shared-state changes. Always confirm with SD before running the POST calls, especially on Production. The dual-env-flip is the default shape of the operation — not a license to skip the confirmation.

Step 2 — Look up the featurestate ID in each environment¶

The featurestate ID differs between Prod and Dev. Look up by feature name in each env first:

# Production
curl -s -H "Authorization: Token $FLAGSMITH_ADMIN_TOKEN" \
  "https://api.flagsmith.com/api/v1/environments/X4CdBvak98wpn6Ljq7eUSs/featurestates/?feature_name=<FLAG_NAME>" \
  | jq '.results[0] | {id, enabled, environment_feature_version}'

# Development
curl -s -H "Authorization: Token $FLAGSMITH_ADMIN_TOKEN" \
  "https://api.flagsmith.com/api/v1/environments/SCE375zGzViFpGZhWoiK7D/featurestates/?feature_name=<FLAG_NAME>" \
  | jq '.results[0] | {id, enabled, environment_feature_version}'

The environment_feature_version UUID confirms V2 is enabled. Record both id values — call them <PROD_FS_ID> and <DEV_FS_ID>.

Step 3 — Look up the feature ID¶

Feature ID is shared across envs (it's per-project). Look it up once:

curl -s -H "Authorization: Token $FLAGSMITH_ADMIN_TOKEN" \
  "https://api.flagsmith.com/api/v1/projects/36214/features/?search=<FLAG_NAME>" \
  | jq '.results[0] | {id, name}'

Record as <FEATURE_ID>.

Step 4 — POST a new environment-feature-version with `enabled: false`¶

For boolean flips on V2 envs, the body must include feature_state_value even when null. Use {"type": "unicode", "string_value": null} to signal "no value" for boolean flags.

Production:

curl -s -X POST -H "Authorization: Token $FLAGSMITH_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  "https://api.flagsmith.com/api/v1/environments/85219/features/<FEATURE_ID>/versions/" \
  -d '{
    "feature_states_to_update": [{
      "id": <PROD_FS_ID>,
      "enabled": false,
      "feature_state_value": {"type": "unicode", "string_value": null}
    }],
    "feature_states_to_create": [],
    "segment_ids_to_delete_overrides": [],
    "publish_immediately": true
  }'

Development:

curl -s -X POST -H "Authorization: Token $FLAGSMITH_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  "https://api.flagsmith.com/api/v1/environments/85220/features/<FEATURE_ID>/versions/" \
  -d '{
    "feature_states_to_update": [{
      "id": <DEV_FS_ID>,
      "enabled": false,
      "feature_state_value": {"type": "unicode", "string_value": null}
    }],
    "feature_states_to_create": [],
    "segment_ids_to_delete_overrides": [],
    "publish_immediately": true
  }'

Expected response: HTTP 201 with the new EnvironmentFeatureVersion UUID in the response body. publish_immediately: true activates the new version atomically — no second "publish" step needed.

Step 5 — Verify the flip landed¶

Re-run the GET from Step 2 against both envs and confirm enabled: false:

for KEY in X4CdBvak98wpn6Ljq7eUSs SCE375zGzViFpGZhWoiK7D; do
  curl -s -H "Authorization: Token $FLAGSMITH_ADMIN_TOKEN" \
    "https://api.flagsmith.com/api/v1/environments/$KEY/featurestates/?feature_name=<FLAG_NAME>" \
    | jq '.results[0] | {env: "'$KEY'", enabled, environment_feature_version}'
done

Both rows should show enabled: false and a new environment_feature_version UUID.

Step 6 — Verify backend picked up the change¶

Flagsmith SDKs cache for ~60 seconds by default. After ~1 minute, hit an endpoint that reads the flag and confirm the OFF path runs. For flags that show up in logs (e.g., flag-gated branches with a log line), tail Railway logs:

railway logs --service backend -n 200 | grep '<FLAG_NAME>'

Step 7 — Document the rollback¶

Comment on the originating PR with the rollback timestamp + reason
Post in #ops Slack with the new version UUIDs
If the flag has known asymmetric semantics (see per-flag section below), capture any data-cleanup work that remains

CONFIG flag rollback (value flip, not boolean)¶

For CONFIG flags where you need to change the value (not just enabled), the body shape is the same but feature_state_value.string_value carries the JSON-encoded payload:

{
  "feature_states_to_update": [{
    "id": <FS_ID>,
    "enabled": true,
    "feature_state_value": {
      "type": "unicode",
      "string_value": "[\"decided_on_destination\"]"
    }
  }],
  "feature_states_to_create": [],
  "segment_ids_to_delete_overrides": [],
  "publish_immediately": true
}

Gotcha: the GET returns feature_state_value as a flat string, but the POST body requires the dict shape. The string_value should be the JSON-encoded payload (escaped quotes for list flags). See reference_flagsmith_v2_value_patch.md for the full pattern.

Per-flag rollback semantics¶

`auto_invoke_matcher_on_intake_complete`¶

Flipped ON: Production + Development, 2026-05-22 (per config/feature_flags.yaml default remains false; runtime override was applied per-identity for SD dogfooding before tenant-wide flip on 2026-05-21).
Default: false (in config/feature_flags.yaml).
Code path: app/agents/auto_invoke_matcher.py (maybe_auto_invoke_matcher); called from app/agents/orchestrator_phases/intake_triage.py:~403.
Predicates checked when ON: intake gates pass + decision_stage ∈ {comparing_options, ready_to_commit} + records_requested + workflow_state.matching_complete is False (idempotency).
What it does when ON: Inside the intake_triage phase, when the gates pass, the orchestrator auto-invokes the matcher and emits a match_results rich-content card to the patient instead of falling through to the Triage Agent's stock "matches in 24-48 hours" deferral.

Asymmetric semantics — read before rolling back¶

Rollback (flag → false) is safe-but-asymmetric:

Safe for the patient: Cases that already fired the matcher card keep their card. The patient sees no regression — the match_results card persists in the conversation transcript and the patient's view of the case is identical before and after the rollback.
Asymmetric on case state: Those cases carry workflow_state.matching_complete = True permanently (set in orchestrator_phases/matching.py:201,278). They never re-enter the auto-invoke path because of the idempotency guard at auto_invoke_matcher.py:89 (if ws.get("matching_complete"): return).
Inconsistency that remains: Cases that fired the matcher card via the flag have matching_complete=True but may carry ehr_constructed=False because the flag-on path does not synthesize the EHR before invoking the matcher in some branches. The inconsistency is invisible to the patient but visible in admin views and to any downstream service that filters on ehr_constructed.
Safe-default direction: Default is OFF. Rolling back is the safe direction. The asymmetry is in the direction new cases stop using the flag-on path, NOT in the direction old flag-on cases get reverted (they don't — there is no migration to undo their matching_complete=True).

What rollback does NOT do¶

Does not delete already-rendered match_results cards from patient transcripts.
Does not unset workflow_state.matching_complete=True on already-routed cases.
Does not roll back any match_results row in Postgres (those are durable matching outputs, not flag-derived state).

Cleanup steps if a data-fix is required after rollback¶

If the architecture review's flagged concern (the matching_complete=True + ehr_constructed=False cohort) becomes load-bearing — e.g., a downstream filter starts producing wrong counts — the cleanup is:

Query the affected cohort:

SELECT id, tenant_id, patient_id
FROM cases
WHERE (workflow_state->>'matching_complete')::bool IS TRUE
  AND (workflow_state->>'ehr_constructed')::bool IS NOT TRUE
  AND created_at >= '<flag_on_timestamp>';

Either backfill the EHR via the retro-rebuild path (scripts/retro_ehr_rebuild.py per PR #1075/#1080) or reset matching_complete on the cohort and let the next intake turn re-invoke the matcher (only safe if the flag is being flipped back ON).

When to flip this flag back ON¶

Re-enable only after: - The asymmetric semantics above are addressed via either a synchronous EHR-before-matcher write or a documented acceptance that the inconsistency does not affect any downstream consumer - A regression test exercises both the flag-ON and flag-OFF paths (per the PR #1088 / #1092 / #1094 Maria-replay pattern) - Architecture review re-signs-off

Reference scripts¶

scripts/flip_mso_flags.py — legacy boolean-only flipper; uses the env-scoped PATCH endpoint. Does NOT work on V2 envs (returns HTTP 400). Extend to use the version endpoint before next boolean flip.
scripts/sync_flagsmith.py — YAML ↔ Flagsmith sync; run with --dry-run first to verify drift.
scripts/create_v6_flags.py — pattern for creating new flags via the V2 endpoint.

runbook/triage-tuning.md — for triage threshold flags (uses per-identity overrides, not env-level)
runbook/deployment.md — for the full deploy sequence including post-deploy flag verification

Open items¶

flip_mso_flags.py needs an update to use the V2 environment-feature-versions endpoint. Tracked in the work queue.
The dual-env rollback flow should be wrapped in a single script (scripts/rollback_flag.py FLAG_NAME) to remove the per-step copy-paste risk during incident response.