WebSocket Authentication and Authorization: Securing Real-Time Connections

A WebSocket connection is not an HTTP request. It lives for minutes or hours, not milliseconds. It outlasts page navigations, token refreshes, and deployments. And if you authenticate it the same way you authenticate a REST endpoint, you are almost certainly leaving a window open for stale or revoked credentials to keep pumping data through your server.

This is the WebSocket authentication trap: the JWT was valid when the handshake happened, the user was authorized to subscribe to the orders channel when they connected, but ten minutes later their session was revoked, their permissions changed, or their token expired, and your server is still piping real-time order updates into a socket that should have been closed.

Fix this in three layers: authenticate the connection, authorize every message or subscription, and re-validate credentials periodically. Here is the pattern.

Layer 1: Authenticate the handshake

WebSocket connections start as HTTP Upgrade requests. That means the Authorization header, cookies, and query parameters are all available during the handshake. This is your only chance to assert that the client is who they claim to be, before the connection upgrades to the WebSocket protocol.

The problem with query parameters is that they end up in server logs, CDN caches, and browser history. Sending a JWT as ?token=... in the URL leaks the token to every layer of infrastructure between client and server. Use the Authorization header instead.

// client-side: attach token to the handshake
const token = await getAccessToken(); // from your auth library

const ws = new WebSocket('wss://api.example.com/ws', [], {
  headers: {
    Authorization: `Bearer ${token}`,
  },
});

Server-side, you need to intercept the Upgrade event and validate the token before the connection switches to the WebSocket protocol.

// server.ts using the `ws` library with Node.js
import { WebSocketServer } from 'ws';
import { verifyToken } from './auth';

const wss = new WebSocketServer({ noServer: true });

const server = http.createServer((req, res) => {
  // Normal HTTP traffic goes here
  res.writeHead(426, { 'Content-Type': 'text/plain' });
  res.end('Upgrade required');
});

server.on('upgrade', (req, socket, head) => {
  const authHeader = req.headers['authorization'];
  if (!authHeader || !authHeader.startsWith('Bearer ')) {
    socket.write('HTTP/1.1 401 Unauthorized\r\n\r\n');
    socket.destroy();
    return;
  }

  const token = authHeader.slice(7);

  try {
    const payload = verifyToken(token);
    // Attach the authenticated user to the request so the WebSocketServer
    // can access it when the 'connection' event fires.
    (req as any).user = payload;
    wss.handleUpgrade(req, socket, head, (ws) => {
      wss.emit('connection', ws, req);
    });
  } catch (err) {
    socket.write('HTTP/1.1 403 Forbidden\r\n\r\n');
    socket.destroy();
  }
});

wss.on('connection', (ws, req) => {
  const user = (req as any).user;
  console.log(`User ${user.id} connected`);

  ws.send(JSON.stringify({ type: 'connected', userId: user.id }));
});

This pattern validates the token before the 101 Switching Protocols response is sent. If the token is expired or malformed, the client gets a standard HTTP error that their WebSocket API surfaces as an onerror event. No connection is established, no resources are allocated on the server, and no data flows.

The noServer: true option on WebSocketServer is critical: it tells ws not to start its own HTTP server, because you are handling the Upgrade yourself. Without it, the library’s built-in Upgrade handler runs and your authentication code is never called.

If your app uses session cookies instead of Bearer tokens, you can read the cookie from req.headers.cookie during the Upgrade. The principle is identical: parse the cookie, validate the session, and either allow the Upgrade or reject it. Cookies have the advantage of not requiring the client-side JavaScript to manage token storage. They have the disadvantage that WebSocket connections from contexts without cookie access (like service workers or some mobile WebSocket libraries) cannot authenticate.

Always support both Authorization header and cookies, but document the precedence: header first, cookie fallback. This gives you the widest client compatibility without sacrificing security.

Layer 2: Authorize per-message, not per-connection

Authenticating the handshake tells you who is connected. It does not tell you what they are allowed to do. A user who connects with a valid session might try to subscribe to a channel they should not have access to, or send a mutation message to a resource they do not own.

Do not authorize once at connection time and then trust the connection for its entire lifetime. Every message that crosses the wire is a fresh authorization boundary.

// Message types and authorization rules
type Message = {
  type: 'subscribe' | 'unsubscribe' | 'send' | 'ping';
  channel?: string;
  payload?: unknown;
};

const CHANNEL_PERMISSIONS: Record<string, string[]> = {
  'orders:user': ['user:read_own_orders'],
  'orders:admin': ['admin:read_all_orders'],
  'notifications': ['user:read_notifications'],
};

function authorize(user: User, channel: string): boolean {
  const required = CHANNEL_PERMISSIONS[channel];
  if (!required) return false;
  return required.some((perm) => user.permissions.includes(perm));
}

wss.on('connection', (ws, req) => {
  const user = (req as any).user;
  const subscriptions = new Set<string>();

  ws.on('message', (raw) => {
    let msg: Message;
    try {
      msg = JSON.parse(raw.toString());
    } catch {
      ws.send(JSON.stringify({ type: 'error', message: 'invalid JSON' }));
      return;
    }

    switch (msg.type) {
      case 'subscribe':
        if (!msg.channel) break;
        if (!authorize(user, msg.channel)) {
          ws.send(JSON.stringify({
            type: 'error',
            message: `not authorized for channel: ${msg.channel}`,
          }));
          break;
        }
        subscriptions.add(msg.channel);
        ws.send(JSON.stringify({
          type: 'subscribed',
          channel: msg.channel,
        }));
        break;

      case 'unsubscribe':
        if (!msg.channel) break;
        subscriptions.delete(msg.channel);
        break;

      case 'send':
        // Re-validate for every action message
        if (!authorize(user, msg.channel || '')) {
          ws.send(JSON.stringify({ type: 'error', message: 'not authorized' }));
          break;
        }
        // Handle the message...
        break;

      case 'ping':
        ws.send(JSON.stringify({ type: 'pong' }));
        break;
    }
  });

  // Clean up server-side subscription tracking on disconnect
  ws.on('close', () => {
    subscriptions.clear();
  });
});

The two-line authorize function is deliberately simple: it checks an explicit permission mapping for the channel. In a real system, this would call your authorization service (or a cached copy of the user’s roles) and check against resource-specific policies, not just channel names. The shape is what matters: every inbound message is gated.

This per-message check catches the case where a user’s permissions change while they are connected. But it only catches it if their in-memory user.permissions is up to date. Which brings us to layer 3.

Layer 3: Re-validate credentials periodically

The handshake authenticated the user. The per-message checks use the user object attached at connection time. But the user object is a snapshot of a moment in time. If an admin revokes the user’s access at 10:05, but the user connected at 09:55, their user.permissions still contains the revoked permissions.

The fix is a periodic heartbeat that re-validates the credentials. Think of it as your WebSocket version of refresh token rotation: you do not trust the original token for longer than a bounded window.

const REAUTH_INTERVAL_MS = 5 * 60 * 1000; // 5 minutes

wss.on('connection', (ws, req) => {
  const user = (req as any).user;
  let token = req.headers['authorization']?.slice(7) || '';

  // Periodically re-validate the token
  const reauthTimer = setInterval(async () => {
    try {
      // The client is expected to send an updated token via a 'reauth' message
      // if the original token has expired. If not, we use the original.
      const payload = verifyToken(token);
      (ws as any).user = payload; // Update the in-memory user
    } catch {
      // Token is invalid or expired. Close the connection.
      ws.send(JSON.stringify({
        type: 'reauth_required',
        message: 'token expired',
      }));
      ws.close(4001, 'token_expired');
      clearInterval(reauthTimer);
    }
  }, REAUTH_INTERVAL_MS);

  // Listen for re-auth messages from the client
  ws.on('message', (raw) => {
    let msg: Message;
    try {
      msg = JSON.parse(raw.toString());
    } catch {
      return;
    }

    if (msg.type === 'reauth' && typeof msg.payload === 'string') {
      try {
        const payload = verifyToken(msg.payload);
        token = msg.payload; // Update the stored token
        (ws as any).user = payload;
        ws.send(JSON.stringify({ type: 'reauth_ok' }));
      } catch {
        ws.send(JSON.stringify({ type: 'reauth_failed' }));
        ws.close(4001, 'reauth_failed');
      }
    }
  });

  ws.on('close', () => {
    clearInterval(reauthTimer);
  });
});

The client side handles the reauth_required message by fetching a fresh token and sending it back:

// client-side re-auth handler
ws.addEventListener('message', async (event) => {
  const msg = JSON.parse(event.data);

  if (msg.type === 'reauth_required') {
    const newToken = await getAccessToken(); // from your auth library
    ws.send(JSON.stringify({
      type: 'reauth',
      payload: newToken,
    }));
  }
});

This pattern covers both expiration (the token expired naturally) and revocation (the token was invalidated by the auth server before it expired). In both cases, the periodic check catches it within five minutes and closes the connection.

The five-minute window is a deliberate trade-off. A shorter window means faster revocation but more overhead from repeated verifyToken calls (which are asymmetric-crypto operations that cost CPU). A longer window saves CPU but leaves revoked connections alive longer. Five minutes is the sweet spot for most production services. If you need tighter revocation, push the check to one minute and cache the token verification results in Redis.

Revocation via server-side push

For instant revocation (not waiting for the next periodic check), wire a pub/sub channel that carries revocation events from your auth service to all WebSocket server instances.

// subscriber-side: listen for revocation events from Redis
import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

// Map userId -> Set<WebSocket>
const userSockets = new Map<string, Set<WebSocket>>();

// When a connection is established, register it
wss.on('connection', (ws, req) => {
  const user = (req as any).user;
  if (!userSockets.has(user.id)) {
    userSockets.set(user.id, new Set());
  }
  userSockets.get(user.id)!.add(ws);

  ws.on('close', () => {
    userSockets.get(user.id)?.delete(ws);
    if (userSockets.get(user.id)?.size === 0) {
      userSockets.delete(user.id);
    }
  });
});

// Subscribe to revocation events
await redis.subscribe('auth:revocation', (message) => {
  const { userId } = JSON.parse(message);
  const sockets = userSockets.get(userId);
  if (sockets) {
    for (const ws of sockets) {
      ws.close(4001, 'session_revoked');
    }
  }
});

The auth service publishes to Redis whenever a token is revoked or a user’s permissions change:

// auth-service.ts: publish revocation when a session is invalidated
await redis.publish('auth:revocation', JSON.stringify({ userId: 'user_123' }));

This gives you sub-second revocation without polling. Every WebSocket server instance subscribes to the same Redis channel and closes the relevant connections immediately.

The practical choice: short-lived access tokens + periodic re-auth

Three patterns for WebSocket auth are common in production, and exactly one of them balances security and complexity well:

Pattern	Security	Complexity	Notes
Static token at connect	Low	Low	Token works until the connection closes. Cannot revoke.
Short-lived token + periodic re-auth	High	Medium	Close connections within minutes of revocation.
Token per message	High	High	Every message carries a fresh token. Bandwidth-heavy.

The short-lived token with periodic re-auth (5-minute window, as shown above) is the practical default. It handles the common cases (expired token, revoked session, permission change) without requiring every message to carry a full JWT, which can be 500+ bytes each.

Token-per-message makes sense in environments where you need instant per-action authorization and bandwidth is not a concern (internal infrastructure, not public-facing mobile apps). Static token makes sense for ephemeral connections that live a few seconds (a WebSocket used for a single file transfer, then closed).

The four things that will break in production

1. The Upgrade event does not fire behind a proxy

If your WebSocket traffic goes through a load balancer, CDN, or reverse proxy that does not forward the Authorization header during the Upgrade, your req.headers['authorization'] will be empty on the server side.

The fix is to configure your proxy to forward headers through the Upgrade. For Nginx:

location /ws {
  proxy_pass http://backend;
  proxy_http_version 1.1;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";
  proxy_set_header Authorization $http_authorization;
  proxy_pass_header Authorization;
}

For AWS ALB, enable “Preserve Host Header” and note that ALB does not forward custom headers during the WebSocket upgrade by default. You may need to use the query-parameter approach (with the caveats about logging above) or terminate TLS at the ALB and use a cookie.

2. The `verifyToken` call blocks the event loop

JWT verification is an asymmetric crypto operation. If you have 10,000 concurrent connections all sending their first message at the same time (a reconnect storm after a deployment), the verifyToken calls in the periodic re-auth interval will block the event loop and spike latency for every other request on the server.

Cache the verification result. Use a simple in-memory Map<string, CachedPayload> with a TTL that is shorter than your re-auth interval:

const verifyCache = new Map<string, { payload: User; cachedAt: number }>();
const CACHE_TTL_MS = 60_000;

function verifyTokenCached(token: string): User {
  const cached = verifyCache.get(token);
  if (cached && Date.now() - cached.cachedAt < CACHE_TTL_MS) {
    return cached.payload;
  }
  const payload = verifyToken(token);
  verifyCache.set(token, { payload, cachedAt: Date.now() });
  // Evict old entries periodically
  if (verifyCache.size > 10_000) {
    const oldest = [...verifyCache.entries()]
      .sort(([, a], [, b]) => a.cachedAt - b.cachedAt)[0];
    verifyCache.delete(oldest[0]);
  }
  return payload;
}

This transforms a crypto operation into a Map lookup for repeated tokens. In a reconnect storm, most clients will present the same token they had before the restart (because they are still within the token’s validity window), and the cache absorbs the load.

3. The `reauth` message race condition

If the client sends a reauth message with a new token at the same time the server’s periodic check fires (using the old token), the server sees an expired token and closes the connection, even though a valid token is in flight.

The fix is to use an epoch counter on the connection. When the client sends a reauth message, increment the epoch. When the periodic check fires, it only closes the connection if the epoch has not changed since the last successful check.

wss.on('connection', (ws, req) => {
  let epoch = 0;
  let lastValidEpoch = 0;

  const reauthTimer = setInterval(() => {
    if (epoch === lastValidEpoch) {
      // No re-auth since last check; validate the current token
      try {
        verifyToken(token);
        lastValidEpoch = epoch;
      } catch {
        ws.close(4001, 'token_expired');
        clearInterval(reauthTimer);
      }
    } else {
      // A re-auth happened between checks; update the baseline
      lastValidEpoch = epoch;
    }
  }, REAUTH_INTERVAL_MS);

  ws.on('message', (raw) => {
    const msg = JSON.parse(raw.toString());
    if (msg.type === 'reauth') {
      try {
        verifyToken(msg.payload);
        token = msg.payload;
        epoch++; // Signal that re-auth happened
        ws.send(JSON.stringify({ type: 'reauth_ok' }));
      } catch {
        ws.send(JSON.stringify({ type: 'reauth_failed' }));
      }
    }
  });
});

This is a small addition that prevents the production outage of “deploying a new version while thousands of clients are mid-reauth.”

4. The authorization service is down during message validation

Your per-message authorize function calls an external service to check permissions. That service goes down. Now every WebSocket message fails authorization and every client gets disconnected for something that is not their fault.

Cache the authorization decisions with a TTL:

const authzCache = new Map<string, { allowed: boolean; cachedAt: number }>();
const AUTHZ_CACHE_TTL_MS = 120_000;

async function authorizeCached(user: User, channel: string): Promise<boolean> {
  const key = `${user.id}:${channel}`;
  const cached = authzCache.get(key);

  if (cached && Date.now() - cached.cachedAt < AUTHZ_CACHE_TTL_MS) {
    return cached.allowed;
  }

  try {
    const allowed = await checkPermission(user.id, channel);
    authzCache.set(key, { allowed, cachedAt: Date.now() });
    return allowed;
  } catch {
    // Auth service is down. Fall back to the cached decision, even if stale.
    if (cached) return cached.allowed;
    // No cache entry at all: deny closed by default.
    return false;
  }
}

This pattern means a brief auth service outage does not cascade into a WebSocket disconnect storm. Users keep receiving data for up to two minutes based on their last known permissions. When the auth service recovers, the cache refreshes and decisions are live again.

What to monitor

Three metrics tell you whether your WebSocket auth is healthy:

ws.auth.handshake_failures — The rate of failed authentication during the Upgrade. A spike means someone is probing your endpoint with bad tokens, or your token issuer is down. Either way, you want to know.

ws.auth.reauth_success_rate — The percentage of re-auth attempts that succeed. If this drops below 99% during normal operation, your tokens are expiring faster than clients can refresh them, or your re-auth window is too tight.

ws.auth.revocation_closes — The rate of connections closed by server-side revocation (not by token expiration or client disconnect). This should be near zero during normal operation and spike when an admin revokes a compromised session or pushes a permission update. If it is always zero, your revocation channel is not working.

Add these to your Prometheus metrics on day one:

import { Counter } from '@opentelemetry/api';

const handshakeFailures = meter.createCounter('ws.auth.handshake_failures');
const reauthAttempts = meter.createCounter('ws.auth.reauth_attempts');
const reauthSuccesses = meter.createCounter('ws.auth.reauth_successes');
const revocationCloses = meter.createCounter('ws.auth.revocation_closes');

Without these metrics, you are flying blind. The authentication layer is invisible in normal HTTP request metrics (because WebSocket connections are not HTTP requests after the Upgrade), and you will not notice when it fails until a security incident is already in progress.

The takeaway

WebSocket authentication is not harder than REST authentication. It is different, and treating it the same way is how you end up with active connections feeding data to users whose sessions were revoked an hour ago.

The three-layer pattern is straightforward: validate the token at handshake time, authorize every message against the user’s current permissions, and re-validate credentials on a timer. Add a Redis pub/sub channel if you need sub-second revocation. Cache token verification results to survive reconnect storms. Cache authorization decisions to survive auth service outages.

The code is under 150 lines. The impact is closing the longest-lived security gap in your real-time infrastructure: the connection that should have been closed but was not.

A note from Yojji

The WebSocket authentication pattern in this post (per-message authorization, periodic credential re-validation, and revocation channels) is the kind of architectural rigor that separates a prototype from a production system. It is also the kind of work that is easy to skip in a sprint and painful to retrofit after a security incident.

Yojji is an international custom software development company with offices in Europe, the US, and the UK. Their teams specialize in building reliable Node.js backends, real-time infrastructure, and secure distributed systems using React, Node.js, TypeScript, and cloud platforms.

If your team needs to ship real-time features without compromising on security or reliability, and you would rather start with the right architecture than learn these lessons after a production incident, Yojji builds the kind of systems where this pattern is standard practice, not a post-incident fix.