Hot-Reloading Configuration in Node.js Without Downtime

The on-call engineer pushed a config change to disable a broken feature flag. The deployment pipeline ran. Kubernetes rolled the pods one by one. Each pod took 45 seconds to drain, 12 seconds to start, and 8 seconds to pass health checks. The total rollout took four minutes. During those four minutes, the database connection pool from the old pods was draining while the new pods were cold-starting, and three requests hit a ENOTCONN socket error because the old process had stopped accepting connections before the load balancer noticed.

Four minutes of degraded traffic to change a boolean.

The worst part: the application already had the infrastructure to read the new value instantly. A mounted ConfigMap updated in seconds. The file was there. The process just had no mechanism to notice and react.

This is the gap this post closes. You build a configuration system that validates at startup (see the previous post on env var management for that pattern), and you add a hot-reload layer on top. The process watches its config sources, detects changes, and applies them gracefully without restarting, dropping connections, or accepting a request with half-applied state.

When hot-reload matters

There are three scenarios where a restart is the wrong tool for the job.

Feature flags and kill switches. A bad feature ships to production. You need to disable it now. Waiting for a rolling deploy means 100% of users see the bug for the duration of the rollout. A hot-reloaded kill switch works in milliseconds.

Rate limit and throttling adjustments. A downstream API starts returning 429s at a lower threshold than expected. You need to tighten your rate limits without redeploying the entire service. If rate limits are read once at startup, you are stuck.

Log level changes during debugging. An incident is in progress and you need debug-level logs from a specific service. Changing the log level at startup time means a pod restart, which might change the behavior you are trying to debug. Hot-reload lets you flip the level without touching the process lifecycle.

All three share the same requirement: the change must be instant (or near-instant), it must not disrupt in-flight requests, and it must be atomic within a single process.

The plainest correct approach

Before reaching for a configuration management library or a distributed store, start with the simplest thing that works: watch a file and re-read it when it changes.

Node.js has fs.watchFile (polling-based) and fs.watch (event-based). fs.watchFile is portable across platforms and less surprising, at the cost of a stat call every few seconds. For config files that change a few times a day, that cost is negligible.

Here is a minimal config loader that supports hot-reload:

// src/config/hot-reload.ts
import { watchFile } from 'node:fs';
import { readFile } from 'node:fs/promises';
import { resolve } from 'node:path';

export interface AppConfig {
  logLevel: string;
  maxConnections: number;
  rateLimitPerMin: number;
  featureFlags: Record<string, boolean>;
  upstreamTimeoutMs: number;
}

let currentConfig: AppConfig;
let listeners: Array<(config: AppConfig) => void> = [];

export function getConfig(): AppConfig {
  if (!currentConfig) {
    throw new Error('Config not loaded. Call loadConfig() first.');
  }
  return currentConfig;
}

export async function loadConfig(configPath: string): Promise<AppConfig> {
  currentConfig = await parseConfigFile(configPath);
  startWatching(configPath);
  return currentConfig;
}

async function parseConfigFile(path: string): Promise<AppConfig> {
  const raw = await readFile(path, 'utf-8');
  const parsed = JSON.parse(raw);

  // Basic validation
  if (typeof parsed.logLevel !== 'string') {
    throw new Error('config.logLevel must be a string');
  }

  return parsed as AppConfig;
}

export function onConfigChange(cb: (config: AppConfig) => void): () => void {
  listeners.push(cb);
  return () => {
    listeners = listeners.filter((l) => l !== cb);
  };
}

function startWatching(path: string): void {
  watchFile(path, { interval: 2000 }, async (curr, prev) => {
    if (curr.mtimeMs === prev.mtimeMs) return;

    try {
      const newConfig = await parseConfigFile(path);
      currentConfig = newConfig;
      console.log(`[config] Reloaded from ${path}`);

      for (const listener of listeners) {
        try {
          listener(newConfig);
        } catch (err) {
          console.error('[config] Listener error during reload:', err);
        }
      }
    } catch (err) {
      console.error(`[config] Failed to reload config: ${err}`);
    }
  });
}

And the application wires it up at startup:

// src/index.ts
import { loadConfig, onConfigChange, getConfig } from './config/hot-reload.js';
import { createServer } from './server.js';

async function main() {
  const configPath = process.env.CONFIG_PATH || '/etc/app/config.json';
  const config = await loadConfig(configPath);

  const server = createServer(config);

  onConfigChange((newConfig) => {
    server.updateRateLimit(newConfig.rateLimitPerMin);
    server.setLogLevel(newConfig.logLevel);
  });

  server.listen(3000);
}

main().catch((err) => {
  console.error('Failed to start:', err);
  process.exit(1);
});

This works. It is simple. It is testable (the parseConfigFile function is a pure async function). But it is also synchronous within the watch callback, which means the configuration update happens entirely between requests in a single-threaded Node.js process. There is no concurrent-read hazard because JavaScript is single-threaded and the event loop does not yield during the listener execution.

A structured config with schema validation

The JSON blob approach works for small configs, but it scales poorly. You want the same Zod schema you use at startup to also validate hot-reloaded values, so a bad config file does not silently corrupt your runtime state.

// src/config/schema.ts
import { z } from 'zod';

export const configSchema = z.object({
  logLevel: z.enum(['fatal', 'error', 'warn', 'info', 'debug', 'trace']),
  maxConnections: z.number().int().positive().max(100),
  rateLimitPerMin: z.number().int().positive(),
  upstreamTimeoutMs: z.number().int().positive().default(30000),
  featureFlags: z.record(z.boolean()),
});

export type AppConfig = z.infer<typeof configSchema>;

Now the loader validates every reload against the same schema used at startup:

import { watchFile } from 'node:fs';
import { readFile } from 'node:fs/promises';
import { configSchema, type AppConfig } from './schema.js';

let currentConfig: AppConfig;
const listeners: Array<(config: AppConfig) => void> = [];

export function getConfig(): AppConfig {
  if (!currentConfig) throw new Error('Config not loaded');
  return currentConfig;
}

export async function loadConfig(configPath: string): Promise<AppConfig> {
  currentConfig = await parseAndValidate(configPath);
  startWatching(configPath);
  return currentConfig;
}

async function parseAndValidate(path: string): Promise<AppConfig> {
  const raw = await readFile(path, 'utf-8');
  const parsed = JSON.parse(raw);
  const result = configSchema.safeParse(parsed);

  if (!result.success) {
    throw new Error(
      `Invalid config: ${result.error.issues
        .map((i) => `${i.path.join('.')}: ${i.message}`)
        .join('; ')}`
    );
  }

  return result.data;
}

function startWatching(path: string): void {
  watchFile(path, { interval: 2000 }, async (curr, prev) => {
    if (curr.mtimeMs === prev.mtimeMs) return;
    try {
      const newConfig = await parseAndValidate(path);
      currentConfig = newConfig;
      for (const listener of listeners) {
        listener(newConfig);
      }
    } catch (err) {
      console.error(`[config] Reload failed, keeping old config: ${err}`);
    }
  });
}

export function onConfigChange(cb: (config: AppConfig) => void): () => void {
  listeners.push(cb);
  return () => {
    const idx = listeners.indexOf(cb);
    if (idx !== -1) listeners.splice(idx, 1);
  };
}

The critical detail here is the error handling in startWatching. If the new config file is malformed (a JSON syntax error, a missing field, a wrong type), the catch block logs the error and keeps the old config in place. The process continues running with the last valid configuration. Without this guard, a broken config file would crash the process and require a full restart to recover, which defeats the purpose of hot-reload.

Coordinating the update: the listener pattern

When the config changes, different parts of the application need to react at different times. A logger can swap its level instantly because it has no persistent state. An HTTP server needs to drain active connections before applying a new timeout. A database pool needs to resize without dropping in-flight queries.

The listener pattern above lets each subsystem register its own update handler. The order of listener execution matters when the changes are interdependent.

// src/server.ts
import { createServer as createHttpServer } from 'node:http';

export function createServer(config: Config) {
  let maxConnections = config.maxConnections;
  let activeConnections = 0;

  const server = createHttpServer((req, res) => {
    if (activeConnections >= maxConnections) {
      res.writeHead(503, { 'Retry-After': '5' });
      res.end('Server busy');
      return;
    }
    activeConnections++;
    res.on('finish', () => activeConnections--);
    // handle request...
  });

  return {
    server,
    updateMaxConnections(n: number) {
      maxConnections = n;
      console.log(`[server] Max connections updated to ${n}`);
    },
    listen(port: number) {
      server.listen(port);
    },
  };
}

And the registration order:

// src/index.ts
const config = await loadConfig(configPath);
const svc = createServer(config);

// Order matters: logger first, then connection pool, then server
onConfigChange((c) => {
  logger.setLevel(c.logLevel);
});

onConfigChange((c) => {
  dbPool.resize(c.maxConnections);
});

onConfigChange((c) => {
  svc.updateMaxConnections(c.maxConnections);
  rateLimiter.setLimit(c.rateLimitPerMin);
});

Listeners run in registration order. Because JavaScript is single-threaded, there is no risk of two listeners seeing different versions of the config. They all see the same newConfig object that was frozen by the schema validation step.

The SIGHUP alternative

File watching is the most approachable pattern, but it has a limitation: it only triggers when the file changes on disk. In Kubernetes, ConfigMap updates can take 60-120 seconds to propagate to all pods. If you need an immediate kill switch, you cannot wait for the file system.

Unix processes have a signal for this: SIGHUP (hangup). Traditionally used to tell a daemon to reload its configuration, SIGHUP is still the most reliable mechanism for triggering an immediate reload in a container environment. You send the signal, and the process re-reads its configs synchronously.

// src/config/sighup-reload.ts
import { readFile } from 'node:fs/promises';
import { configSchema, type AppConfig } from './schema.js';

let currentConfig: AppConfig;
const listeners: Array<(config: AppConfig) => void> = [];
let configPath: string;

export function getConfig(): AppConfig {
  if (!currentConfig) throw new Error('Config not loaded');
  return currentConfig;
}

export async function reloadConfig(): Promise<boolean> {
  try {
    const raw = await readFile(configPath, 'utf-8');
    const parsed = JSON.parse(raw);
    const result = configSchema.safeParse(parsed);

    if (!result.success) {
      console.error(`[config] SIGHUP reload failed validation`);
      return false;
    }

    currentConfig = result.data;
    for (const listener of listeners) {
      listener(currentConfig);
    }
    console.log(`[config] Reloaded via SIGHUP`);
    return true;
  } catch (err) {
    console.error(`[config] SIGHUP reload error: ${err}`);
    return false;
  }
}

export async function loadConfig(path: string): Promise<AppConfig> {
  configPath = path;
  currentConfig = await parseConfig(path);
  process.on('SIGHUP', () => {
    reloadConfig();
  });
  return currentConfig;
}

async function parseConfig(path: string): Promise<AppConfig> {
  const raw = await readFile(path, 'utf-8');
  const result = configSchema.safeParse(JSON.parse(raw));
  if (!result.success) throw new Error(`Config validation failed`);
  return result.data;
}

export function onConfigChange(cb: (config: AppConfig) => void): () => void {
  listeners.push(cb);
  return () => {
    const idx = listeners.indexOf(cb);
    if (idx !== -1) listeners.splice(idx, 1);
  };
}

In Kubernetes, send SIGHUP without entering the pod:

kubectl exec <pod-name> -- kill -HUP 1

Or use kubectl rollout with a pre-stop hook that sends SIGHUP before the pod terminates. This gives the process a chance to reload and handle incoming requests during the rollout, rather than cutting connections immediately.

The file-watching and SIGHUP approaches are not mutually exclusive. Use both: file watching for routine config updates that tolerate the propagation delay, and SIGHUP for emergency kill switches where every second counts.

Atomic state transitions

The hardest problem in hot-reload is not detecting the change. It is applying the change atomically across the process. If you have two related config values that must change together (e.g., a rate limit ceiling and a burst window), applying them in separate steps could leave the system in an inconsistent state for a few microseconds.

The listener pattern solves this naturally because each listener sees the full new config object. But listeners that update shared mutable state must be idempotent, because a broken reload attempt may trigger a retry, or a rapid series of file writes may coalesce into fewer watch events than expected.

// Safe: idempotent setter
let rateLimitConfig = { perMin: 100, burstSize: 10 };

onConfigChange((c) => {
  // Both values updated atomically from the same config snapshot
  rateLimitConfig = {
    perMin: c.rateLimitPerMin,
    burstSize: c.rateLimitBurst ?? Math.floor(c.rateLimitPerMin / 10),
  };
});

// Unsafe: non-atomic partial update
let rateLimitPerMin = 100;
let burstSize = 10;

onConfigChange((c) => {
  rateLimitPerMin = c.rateLimitPerMin;  // applied
  // A concurrent check here could see mismatch
  burstSize = c.rateLimitBurst;          // applied
});

Because JavaScript runs listeners sequentially on the same tick and there is no preemption, the unsafe version is actually safe in single-threaded Node.js. But the code communicates the wrong intent. Use a single immutable config object instead of separate mutable fields.

Testing hot-reload

A hot-reload system is only trustworthy if you can test it. The key insight is that you can decouple the file-watching mechanism from the config-parsing and listener logic.

Test the parsing and validation in isolation

// src/config/__tests__/hot-reload.test.ts
import { describe, it, expect } from 'vitest';
import { configSchema } from '../schema.js';

const validConfig = {
  logLevel: 'info',
  maxConnections: 50,
  rateLimitPerMin: 1000,
  upstreamTimeoutMs: 30000,
  featureFlags: { newUi: true, darkMode: false },
};

describe('config schema', () => {
  it('accepts a valid config', () => {
    const result = configSchema.safeParse(validConfig);
    expect(result.success).toBe(true);
  });

  it('rejects missing required fields', () => {
    const result = configSchema.safeParse({ logLevel: 'info' });
    expect(result.success).toBe(false);
  });

  it('rejects wrong types', () => {
    const result = configSchema.safeParse({
      ...validConfig,
      maxConnections: 'fifty',
    });
    expect(result.success).toBe(false);
  });
});

Test the listener system

describe('onConfigChange', () => {
  it('notifies listeners when config changes', async () => {
    const { loadConfig, onConfigChange, getConfig } = await import(
      '../hot-reload.js'
    );

    // Write a valid config to a temp file
    const tmp = mkdtempSync('config-');
    const configPath = join(tmp, 'config.json');
    writeFileSync(configPath, JSON.stringify(validConfig));

    await loadConfig(configPath);

    const calls: AppConfig[] = [];
    const unsub = onConfigChange((c) => calls.push(c));

    // Simulate a file change
    writeFileSync(configPath, JSON.stringify({
      ...validConfig,
      logLevel: 'debug',
    }));

    // Wait for the watcher to fire
    await new Promise((r) => setTimeout(r, 3000));

    expect(calls.length).toBe(1);
    expect(calls[0].logLevel).toBe('debug');

    unsub();
    rmSync(tmp, { recursive: true });
  }, 10_000);
});

Test SIGHUP handling

describe('SIGHUP reload', () => {
  it('reloads config on SIGHUP signal', async () => {
    const { loadConfig, reloadConfig, getConfig } = await import(
      '../sighup-reload.js'
    );

    const tmp = mkdtempSync('config-');
    const configPath = join(tmp, 'config.json');
    writeFileSync(configPath, JSON.stringify(validConfig));

    await loadConfig(configPath);
    expect(getConfig().logLevel).toBe('info');

    // Change the file and trigger a manual reload
    writeFileSync(configPath, JSON.stringify({
      ...validConfig,
      logLevel: 'debug',
    }));

    const success = await reloadConfig();
    expect(success).toBe(true);
    expect(getConfig().logLevel).toBe('debug');

    rmSync(tmp, { recursive: true });
  });
});

These tests do not require a real Kubernetes pod or a signal to the OS process. They test the contract: “when the file changes, the listeners fire with the new values.” That is sufficient for confidence.

Production deployment considerations

Kubernetes ConfigMap propagation delay

When you update a ConfigMap mounted into a pod, Kubernetes does not update the files immediately. The kubelet syncs the mount at an interval (default 60 seconds), and there is an additional delay if the ConfigMap is updated via kubectl apply versus being referenced from a Deployment with a hash annotation.

If you need faster propagation, use a sidecar that watches the Kubernetes API directly and writes to a shared volume, or use an external configuration store (Consul, etcd, Redis) that the application polls or subscribes to. The file-watching pattern still works with these sources if you have a bridge agent that writes the config to a local file.

Grace period for stale configs

When the config file updates, the listeners fire immediately. But in-flight requests that started before the change may be holding references to the old config. This is usually fine for log levels and feature flags, but for connection limits and timeouts you may want a short grace period where both old and new values are accepted.

function applyWithGracePeriod(callback: () => void, graceMs = 5000) {
  const timer = setTimeout(callback, graceMs);
  return () => clearTimeout(timer);
}

This is a niche concern. In practice, the window between “config changed” and “next request arrives” is a fraction of a millisecond. Only worry about it if you have long-running requests (streaming, SSE, large file uploads) that span multiple ticks.

Logging the reload

Every config reload should produce a log entry with the old and new values (except secrets). This makes it possible to correlate a behavior change with a config change during an incident.

function logDiff(oldCfg: AppConfig, newCfg: AppConfig) {
  for (const key of Object.keys(newCfg) as Array<keyof AppConfig>) {
    if (oldCfg[key] !== newCfg[key]) {
      logger.info(`[config] ${key}: ${oldCfg[key]} -> ${newCfg[key]}`);
    }
  }
}

When not to hot-reload

Hot-reload is not a universal tool. There are configuration changes that genuinely require a restart.

Database connection strings. If the host, port, or authentication credentials change, the existing connection pool must be drained and replaced. This is possible but risky. A safer approach is to blue-green deploy with the new connection string rather than hot-reloading a pool that may hold stale transactions.

TLS certificates. Rotating a certificate in memory is possible with server.setSecureContext(), but the rotation logic must handle the old certificate’s expiration window and the new certificate’s trust chain. Most teams handle this with a pre-rotation sidecar that watches for new certs and calls the update method. If you do not need zero-downtime cert rotation, a rolling restart is simpler and less prone to edge cases.

Module-level code changes. Hot-reload of configuration is not hot-reload of code. If you changed the business logic itself (not a flag that gates it), you need a proper deployment. Do not try to use config hot-reload as a substitute for a code deployment pipeline. It is not.

The practical takeaway

Hot-reload configuration is one of those patterns that looks simple on paper and is simple in code, but is easy to get wrong in production because the error paths are not exercised until the worst moment.

Start with file watching and a Zod schema. Route every config change through a validated listener system. Add SIGHUP handling for emergency kill switches. Test that broken config files leave the old values intact. Log every change with a before-and-after diff.

Before your next production incident that requires a config change, run through this checklist:

Config schema validates both at startup and on every reload.
A malformed config file on reload keeps the old config and logs the error.
Each subsystem registers its own listener instead of polling getConfig().
Listeners are idempotent and derive all related values from the single new config object.
SIGHUP handler is registered and tested (can be triggered manually via kill -HUP).
Config changes are logged with old and new values (secrets excluded).
File watcher interval is balanced for your needs: 2 seconds for emergency switches, 10-30 seconds for routine updates.

A service that can reconfigure itself without restarting is not just more available. It is faster to debug, easier to operate, and cheaper to run because you are not burning infrastructure cycles on rolling restarts for every boolean change.

A note from Yojji

The kind of operational maturity this post describes (validating every config change against a schema, reacting to SIGHUP in production, wiring listeners that update subsystems independently) is the foundation of a service that operators trust to stay up when things go wrong. Yojji’s engineering teams apply these patterns in the Node.js and TypeScript services they build, from configuration management to deployment pipelines, ensuring that the systems they deliver do not require a restart every time a toggle flips.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their teams specialize in the JavaScript ecosystem (React, Node.js, TypeScript), cloud platforms (AWS, Azure, Google Cloud), and full-cycle product engineering from discovery through DevOps. If your team is looking to build services that stay stable under change without sacrificing velocity, Yojji is worth a conversation.