The Practical Developer

The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath

Most teams ship features as “merge to main and deploy.” The result is that a bug affects 100% of users immediately. Five-stage rollouts — internal, 1%, 10%, 50%, 100% — turn “oh no” into “catch it at 1%.” Here is the working pattern, the metrics that gate each stage, and the rollback procedure.

A control panel — the right metaphor for the gates and stages of a careful rollout

The team merges a refactor of the payment flow. CI passes. The deploy pipeline runs. Five minutes later, every user trying to check out hits a 500 error. Ten minutes after that, the team has rolled back, but $20k of orders are lost and customer support is dealing with refund requests. The bug only manifests under real production traffic — load testing didn’t catch it because the failure is on a specific edge case.

The fix is not “test more carefully.” Some bugs only surface in production, with real users. The fix is to limit the blast radius of the inevitable bad deploy by rolling out gradually: 1% of users first, then 10%, then 50%, then 100%. A bug at 1% is a small fire. A bug at 100% is a postmortem.

This post is the five-stage rollout pattern, the metrics that gate each stage, the tooling, and the rollback procedure that has to actually work.

The five stages

Stage 0: Internal users only        (5-10 employees, typically 0-1 days)
Stage 1: 1% of production traffic   (1-2 days)
Stage 2: 10% of production traffic  (1-2 days)
Stage 3: 50% of production traffic  (1-2 days)
Stage 4: 100% of production traffic (the feature is launched)
Cleanup: Remove the flag and the old code path

The total time from first deploy to 100% is typically 4-7 days for a non-critical feature, longer for high-risk changes (payments, auth).

The mechanism is a feature flag. Code path A is the old behavior, code path B is the new. The flag percentage controls how many users get B.

if (await flags.isEnabled('new-checkout-flow', { user })) {
  return renderNewCheckout();
}
return renderOldCheckout();

Both code paths are in production simultaneously. The flag is what selects which one runs.

What gates each stage

Don’t promote stages on a timer alone. Gate them on observed metrics:

Stage 0 → 1: Internal usage shows no errors for 24 hours. The team has manually tested the happy path and a few edge cases.

Stage 1 → 2: At 1% for 24-48 hours:

  • Error rate for the new path is not higher than the old path.
  • p95 latency for the new path is comparable.
  • No customer support tickets attributed to the change.

Stage 2 → 3: Same checks at 10%. Plus: any rare failure modes that need volume to surface have had time to appear.

Stage 3 → 4: Same at 50%. By this point, you have high confidence the change is good.

Cleanup: After 100% holds for 1 week, remove the old code and the flag (see the feature-flags post).

The metrics worth gating on

A small set covers most use cases:

  • Error rate of the affected endpoints.
  • p95 / p99 latency of the affected endpoints.
  • Conversion rate for revenue-relevant flows (checkout, signup).
  • Background-job success rate if the change touches async processing.
  • Customer support ticket volume in relevant categories.

Compare each metric for users in the new variant vs users in the old variant — not absolute. A 10% error rate is bad universally; the question is “did this change make it worse?”

How to bucket users

Three options:

1. Random by user ID. Stable bucketing — a user who got the new flow on Monday gets it on Tuesday too. The right default. Use a hash of (user_id, flag_name).

2. By cohort. Internal employees first, then beta opt-ins, then specific customer segments, then everyone.

3. By geographic region. Deploy to one region first to limit blast radius further.

For most rollouts, (1) is sufficient. (2) and (3) layer on top for extra-risky changes.

The rollback procedure

A rollout is only as good as its rollback. The procedure must be:

  • Fast. Sub-minute. A bad deploy at 50% needs to be rolled back in seconds, not “let me find the playbook.”
  • Reversible. Old code path still in production. Flipping the flag back to 0% reverts behavior.
  • Tested. Verify the rollback works before you need it. Flip the flag back to 0% in staging; confirm old behavior returns.

The single command:

flagsctl set new-checkout-flow --percentage 0

Or click a button in LaunchDarkly. Either way, sub-minute. Document the runbook, including who has access.

Tooling: build vs buy

For most teams, buying a feature-flag service is the right call:

  • LaunchDarkly — most mature, most expensive.
  • Statsig — strong experimentation focus.
  • Unleash — open-source, self-hostable.
  • PostHog — flags + analytics + session replay.
  • GrowthBook — open-source, focused on experiments.

For very small teams, a Postgres-backed flag table works:

CREATE TABLE feature_flags (
  name        text PRIMARY KEY,
  percentage  int NOT NULL DEFAULT 0,
  updated_at  timestamptz NOT NULL DEFAULT now()
);
async function isEnabled(name: string, user: { id: string }): Promise<boolean> {
  const { rows } = await pool.query(
    'SELECT percentage FROM feature_flags WHERE name = $1', [name]);
  if (!rows[0]) return false;
  const hash = murmurHash(user.id + name) % 100;
  return hash < rows[0].percentage;
}

Add caching (30s TTL) and you have a working flag system in 30 lines. For under ~20 flags, this is fine. Past that, buy.

Common pitfalls

1. Sticky bucketing not implemented. A user sees the old flow on one request and the new flow on the next. Confusing UX, broken UX in some cases. Always bucket by (user_id, flag_name) hash, deterministic.

2. The new and old paths share state in incompatible ways. A new flow writes data the old flow can’t read. When you roll back, users on the new flow are stranded. Design changes so old and new are mutually compatible.

3. Comparing metrics globally instead of per-variant. Total error rate is at 0.5%; you don’t notice that error rate for new-flow users is 5%. Always slice metrics by variant.

4. Skipping stages under deadline pressure. “We have to ship by Friday — let’s go straight to 50%.” That is exactly the situation that produces the postmortem. Stages exist to prevent the rare bad outcome; the rare bad outcome is exactly when you’d be tempted to skip them.

5. Forgetting to clean up. The flag is at 100% for 6 months and the old code is still in the codebase. Set a calendar reminder to delete the flag two weeks after 100%.

A different shape: dark launches

For very risky changes (database migrations, major refactors), “dark launch” first:

  • Run the new code path in production but discard its output.
  • Compare the new path’s behavior to the old path’s behavior in real-time.
  • Only after they agree consistently, switch to actually using the new path.

For example:

async function chargeCustomer(orderId: string) {
  const oldResult = await chargeCustomerOldFlow(orderId);

  if (await flags.isEnabled('dark-launch-new-charge')) {
    try {
      const newResult = await chargeCustomerNewFlowDryRun(orderId);
      logComparison({ orderId, old: oldResult, new: newResult });
    } catch (err) {
      logDarkLaunchError({ orderId, err });
    }
  }

  return oldResult;
}

Production behavior is unchanged. New code is exercising real data. Discrepancies surface without affecting users.

Beyond five stages

For the most critical paths (payment, auth), more granular stages help:

Internal → 0.1% → 1% → 5% → 10% → 25% → 50% → 100%

Eight stages, each at the prior level for 24-48 hours. A change that passes all of these is genuinely battle-tested.

Conversely, for trivial changes (a copy update, a bug fix in non-critical code), three stages may be enough:

Internal → 50% → 100%

Match the rollout granularity to the risk.

The takeaway

A staged rollout converts inevitable bugs from disasters into observations. 1%, 10%, 50%, 100% — each gated on metrics, with a fast rollback. Bucket users stably. Compare per-variant metrics, not totals. Clean up flags after they’ve been at 100%.

The team that adopts this finds that “we shipped a bad deploy” stops meaning “all customers were affected” and starts meaning “1% of customers saw a transient bug for two hours.” That is the difference between a good engineering culture and a fire-fighting one.


A note from Yojji

The kind of release-engineering discipline that turns “we hope this works” into “we measured at each stage and it worked” — staged rollouts, rollback procedures, per-variant metrics — is the kind of long-haul engineering practice Yojji’s teams build into the products they ship for clients.

Yojji is an international custom software development company founded in 2016, with teams across Europe, the US, and the UK. They specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, GCP), and full-cycle product engineering — including the rollout and deployment practices that decide whether shipping is risky or routine.