The Practical Developer

Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js

Role-based access control breaks down when you need "users who can view docs shared with anyone in the engineering group." Zanzibar replaces roles with relation tuples and computes answers via graph traversal. Here is the mental model, the tuple grammar, and a production-grade check engine you can ship today.

A close-up of an ornate brass keyhole, representing the precise, tuple-level permissions that Zanzibar-style systems compute

Your authorization code started simple. if (user.role === 'admin') was enough. Then you added team-level access. Then document-level sharing. Then “users in the engineering group can edit the runbook.” Then “contractors can view unless the doc is marked internal.” Then nested groups inherited permissions from parent groups. Your authorize() function is now 400 lines of nested if statements, requires three database queries to answer a single check, and still returns false for a case nobody thought of.

This is the moment role-based access control (RBAC) dies. What replaces it is not attribute-based access control (ABAC) with a generic rules engine (those are slow, hard to audit, and impossible to cache). What replaces it is relation-based access control, the model Google published in the Zanzibar paper, and the model that now powers authorization at Spotify, Airbnb, Carta, and any team that outgrows roles.

This post is the Zanzibar mental model in plain language, the tuple grammar that replaces your roles table, the three rewrite rules that let you cache checks forever, and a production-grade check engine in Node.js that answers “can user:123 edit doc:456?” in a single graph traversal.

Why RBAC falls over (and ABAC is not the fix)

RBAC assigns a role to a user. The role maps to a list of permissions. It works until a permission depends on two things at once: the user, and the resource, and some context that lives in a different table.

Example: a user can view a document if they are the owner, or if the document is shared with their team, or if their team is a descendant of the document’s owner team in an org chart. In SQL, that is a recursive CTE across three tables with an OR branch for direct ownership. It is slow, hard to index, and you run it on every request.

ABAC says “write a policy function.” That function can query anything. The problem is that the policy is code. You cannot cache code. You cannot list “all documents this user can view” without evaluating the policy for every document. You cannot answer “who can view this document?” without evaluating it for every user. Audit logs become “the policy evaluated to true at this timestamp,” which tells you nothing about why.

Zanzibar’s insight is that authorization is a graph problem, not a logic problem. If you can express every permission as a small set of edges (relation tuples), you can traverse that graph with the same algorithms you already use for social networks or dependency trees. And because the graph is made of immutable tuples, you can cache the results aggressively.

The tuple grammar in five minutes

A relation tuple has four fields:

<object>#<relation>@<user>

Where object is namespace:object_id, relation is a string like owner or viewer, and user is either a direct user (user:123) or another object (group:eng#member), meaning “anyone who is a member of group:eng.”

Examples:

doc:runbook#owner@user:alice
doc:runbook#viewer@group:eng#member
group:eng#member@user:bob
group:eng#member@group:contractors#member

These four tuples mean:

  1. Alice is an owner of doc:runbook.
  2. Anyone who is a member of group:eng is a viewer of doc:runbook.
  3. Bob is a member of group:eng.
  4. Anyone who is a member of group:contractors is also a member of group:eng.

To check “can user:bob view doc:runbook?” you traverse:

  • Does doc:runbook#viewer@user:bob exist directly? No.
  • Does doc:runbook#viewer@group:eng#member exist? Yes. So: is user:bob a member of group:eng?
  • Does group:eng#member@user:bob exist? Yes. Access granted.

If group:eng#member had pointed at group:contractors#member, you would recurse one level deeper. The depth is bounded by your group nesting depth, which is usually under five.

That is it. No roles table. No permissions table. No policy engine. Just edges in a graph.

Namespace configuration: union, intersection, and exclusion

Real systems need more than direct tuples. You need computed relations. Zanzibar handles this with a namespace configuration that defines how relations combine.

name: doc
relation {
  name: "owner"
}
relation {
  name: "editor"
  union {
    child { _this {} }
    child { computedUserset { relation: "owner" } }
  }
}
relation {
  name: "viewer"
  union {
    child { _this {} }
    child { computedUserset { relation: "editor" } }
    child { tupleToUserset {
      tupleset { relation: "parent" }
      computedUserset { relation: "owner" }
    }}
  }
}

This says:

  • owner is set directly by tuples.
  • editor is anyone who is directly an editor, or anyone who is an owner.
  • viewer is anyone who is directly a viewer, or anyone who is an editor, or anyone who is an owner of the parent folder (via tupleToUserset, which follows a parent tuple to another object).

The three composition operators are:

  • Union (union): access if any child grants access. (Editor includes owner.)
  • Intersection (intersection): access only if all children grant access. (Approver requires both editor and signer.)
  • Exclusion (exclusion): access if the first child grants access and the second does not. (Viewer unless banned.)

In practice, union and tuple-chasing handle 95% of real use cases. Intersection is for high-sensitivity actions (e.g., releasing to production requires both deployer and oncall). Exclusion is rare and usually better handled by removing tuples than by negative logic.

The Node.js check engine

Here is a check engine that stores tuples in Postgres (a natural fit because you already have it), answers checks with recursive CTEs, and adds an in-memory LRU cache so repeated checks cost microseconds, not milliseconds.

Schema:

CREATE TABLE relation_tuples (
  namespace TEXT NOT NULL,
  object_id TEXT NOT NULL,
  relation TEXT NOT NULL,
  user_type TEXT NOT NULL, -- 'direct' or 'set'
  user_id TEXT NOT NULL,
  user_relation TEXT,      -- NULL for direct, e.g. 'member' for set
  PRIMARY KEY (namespace, object_id, relation, user_id, user_relation)
);

CREATE INDEX idx_tuple_user ON relation_tuples(user_type, user_id, user_relation);

The primary key is the forward lookup (what users have access to this object?). The secondary index is for reverse lookups (what objects does this user have access to?), which you need for list queries.

Storing tuples:

import { Pool } from 'pg';

interface Tuple {
  namespace: string;
  objectId: string;
  relation: string;
  user: string | { namespace: string; objectId: string; relation: string };
}

async function writeTuple(pool: Pool, tuple: Tuple): Promise<void> {
  const isDirect = typeof tuple.user === 'string';
  await pool.query(
    `INSERT INTO relation_tuples (namespace, object_id, relation, user_type, user_id, user_relation)
     VALUES ($1, $2, $3, $4, $5, $6)
     ON CONFLICT DO NOTHING`,
    [
      tuple.namespace,
      tuple.objectId,
      tuple.relation,
      isDirect ? 'direct' : 'set',
      isDirect ? tuple.user : tuple.user.objectId,
      isDirect ? null : tuple.user.relation,
    ]
  );
}

Checking access:

interface CheckRequest {
  namespace: string;
  objectId: string;
  relation: string;
  user: string;
}

const CHECK_CACHE = new Map<string, { result: boolean; expiry: number }>();
const CACHE_TTL_MS = 5_000;

function cacheKey(req: CheckRequest): string {
  return `${req.namespace}:${req.objectId}#${req.relation}@${req.user}`;
}

async function check(pool: Pool, req: CheckRequest, maxDepth = 10): Promise<boolean> {
  if (maxDepth <= 0) return false;

  const key = cacheKey(req);
  const cached = CHECK_CACHE.get(key);
  if (cached && cached.expiry > Date.now()) return cached.result;

  // 1. Direct tuple match.
  const direct = await pool.query(
    `SELECT 1 FROM relation_tuples
     WHERE namespace = $1 AND object_id = $2 AND relation = $3
       AND user_type = 'direct' AND user_id = $4
     LIMIT 1`,
    [req.namespace, req.objectId, req.relation, req.user]
  );

  if (direct.rowCount && direct.rowCount > 0) {
    CHECK_CACHE.set(key, { result: true, expiry: Date.now() + CACHE_TTL_MS });
    return true;
  }

  // 2. Userset match: the object delegates this relation to members of a group.
  const usersets = await pool.query(
    `SELECT user_id, user_relation FROM relation_tuples
     WHERE namespace = $1 AND object_id = $2 AND relation = $3
       AND user_type = 'set'`,
    [req.namespace, req.objectId, req.relation]
  );

  for (const row of usersets.rows) {
    const memberReq: CheckRequest = {
      namespace: row.user_id.split(':')[0],
      objectId: row.user_id.split(':')[1],
      relation: row.user_relation,
      user: req.user,
    };
    const memberCheck = await check(pool, memberReq, maxDepth - 1);
    if (memberCheck) {
      CHECK_CACHE.set(key, { result: true, expiry: Date.now() + CACHE_TTL_MS });
      return true;
    }
  }

  // 3. Computed userset: the relation includes another relation on the same object.
  // (In a real implementation this is driven by a namespace config table.)
  const computed = await getComputedRelations(req.namespace, req.relation);
  for (const parentRel of computed) {
    const parentReq: CheckRequest = {
      ...req,
      relation: parentRel,
    };
    const parentCheck = await check(pool, parentReq, maxDepth - 1);
    if (parentCheck) {
      CHECK_CACHE.set(key, { result: true, expiry: Date.now() + CACHE_TTL_MS });
      return true;
    }
  }

  CHECK_CACHE.set(key, { result: false, expiry: Date.now() + CACHE_TTL_MS });
  return false;
}

The getComputedRelations function is a placeholder for your namespace config. In a minimal system, it returns the parent relations from a map:

const NAMESPACE_CONFIG: Record<string, Record<string, string[]>> = {
  doc: {
    editor: ['owner'],
    viewer: ['editor'],
  },
};

async function getComputedRelations(ns: string, rel: string): Promise<string[]> {
  return NAMESPACE_CONFIG[ns]?.[rel] ?? [];
}

Critical fix: set a max depth. Group cycles (group:a#member@group:b#member, group:b#member@group:a#member) will recurse forever without a depth limit. In production, detect cycles with a visited-set per request instead of a depth counter. The depth counter is simpler to read.

Zanzibar’s three caching rules (and why they matter)

The Zanzibar paper claims it checks billions of tuples per second with sub-10ms latency. It does this with three rules that are easy to overlook and hard to retrofit.

1. New enemy problem: clocks lie.

If Alice removes Bob from group:eng at t=10, and a check at t=11 reads a cache entry written at t=9 that says Bob is a member, the cache is stale. Zanzibar solves this with a global timestamp (a hybrid logical clock, or ZooKeeper in practice). Every write gets a timestamp; every check reads at a timestamp. Caches are keyed by (cache_key, timestamp), and the cache is invalidated not by events but by time monotonicity.

In a smaller system without a global clock, you accept a bounded inconsistency window (the 5-second TTL above), or you stamp writes with a Postgres xid and include the transaction ID in the cache key. The practical fix most teams use: keep the TTL under 100ms for active objects, and evict aggressively on write.

2. Leopard caching: cache the subgraph, not the result.

A naive cache stores check(doc:runbook, viewer, user:bob) -> true. Zanzibar caches group:eng#member -> {user:alice, user:bob, ...} (the full set of members). If the next check asks about user:carol, the subgraph is already cached. This is a trade-off: higher memory use, fewer cache misses, and it makes list queries (“all viewers of this document”) fast.

For most teams, a per-check LRU with a short TTL is enough until you hit 100k+ checks per second. At that point, move to Redis with set-caching for the hot usersets.

3. Check depth matters more than tuple count.

A check that traverses five levels of group nesting is slow even if there are only 100 tuples total. Flatten group hierarchies aggressively. If group:eng has 500 members through three layers of nesting, materialize the transitive closure in a separate table (group_transitive_members) and update it when group tuples change. This turns a depth-5 traversal into a single index lookup.

Listing objects: the other half of the problem

Checking can user:123 view doc:runbook? is one operation. Rendering a dashboard that says “here are the 20 documents you can view, paginated” is another. Zanzibar calls this Read (as opposed to Check).

A simple Read implementation uses the reverse index and the transitive member table:

SELECT DISTINCT namespace, object_id, relation
FROM relation_tuples
WHERE user_type = 'direct'
  AND user_id = 'user:123'
  AND relation = 'viewer'
UNION
SELECT DISTINCT t.namespace, t.object_id, t.relation
FROM relation_tuples t
JOIN group_transitive_members gtm
  ON t.user_type = 'set'
 AND t.user_id = gtm.group_namespace || ':' || gtm.group_id
 AND t.user_relation = gtm.group_relation
WHERE gtm.member_id = 'user:123'
  AND t.relation = 'viewer'
ORDER BY namespace, object_id
LIMIT 20;

This query is why the reverse index (idx_tuple_user) and the transitive member table matter. Without them, listing user-visible objects requires evaluating the policy for every object in the database.

Production checklist

  • Set max recursion depth (or cycle detection) on every Check. A single malformed tuple can turn your auth service into a stack overflow.
  • Use transactions for writes. Two concurrent writeTuple calls for the same subject can race and duplicate subtleties you never test.
  • Cache negative results. A miss (“user cannot view”) is as cacheable as a hit. Without negative caching, repeated unauthorized requests become expensive database traversals.
  • Log tuple changes to an outbox. Authorization is audit-critical. Every writeTuple and deleteTuple should emit an event to Kafka or a Postgres outbox table so you can answer “when did Bob gain access to the runbook?”
  • Avoid exclusion in namespace configs. “Access unless banned” is harder to reason about and cache than “remove the tuple when banned.” Move exclusion logic to tuple writes.
  • Test with a snapshot of production tuples. Authorization bugs are edge cases in graph shape. Export a sanitized snapshot of your production tuple graph and run property-based tests against it.

When not to use Zanzibar

  • Single-role systems. If your app has admin/user/guest and no nesting, RBAC is simpler and faster.
  • Attribute-heavy rules. If access depends on time-of-day, IP geofencing, or dynamic quotas, you need an ABAC engine (like OPA or Cedar) alongside the tuple graph, or the tuple graph becomes a dressed-up policy engine.
  • Ultra-low latency checks. If you need <1ms checks at millions per second, you need a dedicated service (SpiceDB, Keto, Google’s Zanzibar) with a compiled query planner and a distributed cache. The Postgres implementation above is good for <1k req/s per instance.

A note from Yojji

The kind of work this post describes (replacing a brittle roles matrix with an auditable graph, hardening against recursion cycles, and sizing the cache layer so listing queries stay fast) is the foundational backend engineering that most teams skip until authorization becomes a production incident. It is also the kind of work Yojji’s senior engineers bake into the full-stack products they ship.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their teams specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, Google Cloud), and microservices architectures, and they run both dedicated senior outstaffed teams and full-cycle product engagements covering discovery, design, development, QA, and DevOps.