The Practical Developer

File Uploads in Node.js: Processing, Validation, and Storage in Production

One wrong file can crash your Node.js process, fill your disk, or expose your internal network. Here is the upload pipeline that validates content, streams to storage, and rejects everything dangerous before it touches your application memory.

Rows of server racks in a data center, the final destination for every uploaded file that makes it past your validation pipeline

A CSV file that was actually a ZIP bomb landed in one of my inboxes recently. Not a malicious attacker. A well-intentioned product manager who exported a “CSV” from a third-party tool that quietly renamed a compressed archive. The upload handler read it into memory, checked the extension, called it valid, and wrote 2 GB of decompressed data to the disk before the container’s filesystem filled and the orchestrator killed the pod.

This is not an edge case. Every Node.js API that accepts file uploads is a target, whether it knows it or not. The attacker does not need a zero-day. They need a field on a form that accepts multipart data, a handler that trusts the content-type header, and a developer who did not know that multer defaults to storing files on disk with the original filename.

The fix is an upload pipeline with six stages: protocol limits, content-type validation, stream inspection, size enforcement, safe storage, and cleanup guarantees. This post covers all six with working code, starting from the naive handler and building up to a production-safe pipeline.

The naive handler that bites you

Most file upload tutorials stop at this pattern:

import multer from 'multer';

const upload = multer({ dest: 'uploads/' });

app.post('/api/upload', upload.single('file'), (req, res) => {
  res.json({ filename: req.file?.filename, size: req.file?.size });
});

This code is dangerous on its face. Here is what it does by default:

  • Accepts any content-type, including application/zip with a .pdf extension.
  • Stores files in a local directory with the original filename, meaning a file called ../../../etc/crontab overwrites system files if the destination path is resolved unsafely.
  • Reads the entire file into memory before your handler sees it (unless you configure it otherwise).
  • Gives you no control over the total upload size until the file has already been written.
  • Offers no way to reject a file mid-stream based on its actual content.

You are one misconfigured form away from a production incident. Let me show you the pipeline that prevents all of these.

Stage 1: Busboy limits at the protocol level

Multer is a wrapper around busboy, the same library that powers Express’s body-parser. Multer hides the limits that busboy exposes. Drop Multer for raw busboy, or at minimum pass explicit limits through Multer’s options:

import multer from 'multer';

const upload = multer({
  // busboy limits, passed directly
  limits: {
    fieldNameSize: 100,       // default 100 bytes
    fieldSize: 1024 * 100,    // 100 KB max per text field
    fields: 10,               // max non-file fields
    fileSize: 10 * 1024 * 1024, // 10 MB max per file
    files: 1,                 // max number of files
    headerPairs: 2000,        // default 2000, fine
  },
});

These limits are enforced by busboy at the stream level. If a client sends a multipart body that exceeds fileSize, busboy stops parsing and emits an error before the file payload is fully written. The client sees a 400 instead of a 413 minutes later after the entire file has been buffered.

The critical one is fileSize. Without it, a client can send a 50 GB file, and busboy will buffer it to disk until the OS says no. With it, the stream rejects at the boundary.

But limits alone do not validate content. A 10 MB file that is an executable renamed to .jpg passes the limit check. The next stages catch that.

Stage 2: Validate content type before the stream starts

The Content-Type header in the multipart part is client-sent and trivially spoofable. Use it as a hint, not a gate. The real check happens at the stream level by inspecting the file magic bytes.

The pattern is to use a writable stream that buffers the first few bytes, inspects them against known magic numbers, and then pipes the rest to storage. Here is a reusable contentValidator stream:

import { Transform } from 'node:stream';

type FileType = 'image' | 'pdf' | 'csv' | 'json' | 'zip' | 'unknown';

const MAGIC_BYTES: Record<FileType, Uint8Array[]> = {
  image: [
    new Uint8Array([0xFF, 0xD8, 0xFF]),             // JPEG
    new Uint8Array([0x89, 0x50, 0x4E, 0x47]),       // PNG
    new Uint8Array([0x47, 0x49, 0x46, 0x38]),       // GIF
    new Uint8Array([0x42, 0x4D]),                    // BMP
    new Uint8Array([0x52, 0x49, 0x46, 0x46]),       // WEBP (RIFF...WEBP)
  ],
  pdf: [new Uint8Array([0x25, 0x50, 0x44, 0x46])],  // %PDF
  csv: [],  // no reliable magic bytes, handled differently
  json: [], // no reliable magic bytes, validated after parse
  zip: [
    new Uint8Array([0x50, 0x4B, 0x03, 0x04]),       // ZIP
    new Uint8Array([0x50, 0x4B, 0x05, 0x06]),       // ZIP empty archive
    new Uint8Array([0x50, 0x4B, 0x07, 0x08]),       // ZIP spanned
  ],
  unknown: [],
};

function detectType(buffer: Buffer): FileType {
  for (const [type, sigs] of Object.entries(MAGIC_BYTES)) {
    if (sigs.length === 0) continue;
    const matched = sigs.some((sig) =>
      buffer.subarray(0, sig.length).equals(sig)
    );
    if (matched) return type as FileType;
  }
  return 'unknown';
}

function createContentValidator(allowedTypes: FileType[]) {
  let inspected = false;

  return new Transform({
    transform(chunk, _encoding, callback) {
      if (!inspected) {
        const header = chunk.subarray(0, 16);
        const detected = detectType(header);
        if (!allowedTypes.includes(detected)) {
          callback(new Error(`Rejected file type: ${detected}`));
          return;
        }
        inspected = true;
      }
      callback(null, chunk);
    },
  });
}

This runs in the stream before the file reaches storage. A JPEG renamed to .zip is caught because the magic bytes say FF D8 FF not PK 03 04. By the time the transform finishes, the file has already been rejected without touching disk.

The tradeoff is that the first chunk must contain enough bytes. The standard is 16 bytes, which covers every common format. Files smaller than 16 bytes (a 5-byte text file) are not caught by magic-byte validation and need a separate path.

Stage 3: Size enforcement at the stream boundary

The busboy fileSize limit from Stage 1 catches oversized files at the multipart parser level. But if you use raw busboy or a custom parser, you need a stream-level size check too. A simple Transform that counts bytes and rejects past a threshold:

function createSizeLimiter(maxBytes: number) {
  let total = 0;
  return new Transform({
    transform(chunk, _encoding, callback) {
      total += chunk.length;
      if (total > maxBytes) {
        callback(new Error(`File exceeds ${maxBytes} byte limit`));
        return;
      }
      callback(null, chunk);
    },
  });
}

This looks redundant if you already set limits.fileSize in multer. It is not. A custom parser that does not use multer, or a direct busboy integration, does not enforce the limit unless you add it yourself. The sizeLimiter stream is insurance that works regardless of the parser.

Stage 4: Safe storage with sanitized filenames

The default multer behavior of dest: 'uploads/' with the original filename is broken in production for three reasons:

  1. Path traversal. A filename like ../../etc/passwd writes outside the uploads directory.
  2. Collision. Two users uploading resume.pdf overwrite each other’s files.
  3. No partitioning. Thousands of files in a single directory cause filesystem slowdowns (ext4 and XFS degrade past ~10,000 entries per directory).

Fix all three by generating a storage key that is derived from the content, not the client:

import { randomUUID } from 'node:crypto';
import path from 'node:path';

interface StorageKey {
  directory: string;
  filename: string;
  key: string; // full relative path
}

function generateStorageKey(originalName: string): StorageKey {
  const ext = path.extname(originalName).toLowerCase();
  const safeExt = ['.jpg', '.jpeg', '.png', '.gif', '.pdf', '.csv', '.json', '.zip']
    .includes(ext) ? ext : '.bin';

  // Create a date-partitioned directory structure: uploads/2026/06/18/
  const now = new Date();
  const dir = `uploads/${now.getUTCFullYear()}/${String(now.getUTCMonth() + 1).padStart(2, '0')}/${String(now.getUTCDate()).padStart(2, '0')}`;

  const uuid = randomUUID();
  const filename = `${uuid}${safeExt}`;
  const key = `${dir}/${filename}`;

  return { directory: dir, filename, key };
}

UUID filenames eliminate collisions. Date-partitioned directories keep each folder well under 10,000 files. The extension is validated against an allowlist; everything else becomes .bin. No path traversal is possible because the output path is entirely server-generated.

If you use S3, R2, or GCS, the same logic applies: the object key is all that matters. Use uploads/2026/06/18/<uuid>.pdf as the key and you get partitioning for free.

Stage 5: Stream-to-storage with no temp files

The production pattern streams the file directly to its final destination without writing to a local temp file first. For local disk, that means a WriteStream to the final path. For S3, that means the Upload class from the AWS SDK v3, which accepts a stream and handles multipart upload automatically.

Here is a complete upload handler that chains all the stages:

import { createWriteStream, existsSync, mkdirSync } from 'node:fs';
import { pipeline } from 'node:stream/promises';
import { randomUUID } from 'node:crypto';
import multer from 'multer';

const ALLOWED_TYPES = ['image', 'pdf', 'csv', 'zip'] as const;

// No dest option -- we handle storage ourselves
const upload = multer({
  storage: multer.memoryStorage(), // keep it in memory *only* for streaming
  limits: {
    fileSize: 10 * 1024 * 1024, // 10 MB
    files: 1,
    fields: 5,
  },
  // fileFilter runs before storage; we use it for a quick extension sanity check
  fileFilter: (_req, file, cb) => {
    const allowedMimes = [
      'image/jpeg', 'image/png', 'image/gif', 'image/webp',
      'application/pdf',
      'text/csv', 'application/vnd.ms-excel',
      'application/zip', 'application/x-zip-compressed',
    ];
    if (allowedMimes.includes(file.mimetype)) {
      cb(null, true);
    } else {
      cb(new Error(`Unsupported content-type: ${file.mimetype}`));
    }
  },
});

app.post('/api/upload', upload.single('file'), async (req, res) => {
  if (!req.file) {
    return res.status(400).json({ error: 'No file provided' });
  }

  const originalName = req.file.originalname;
  const storageKey = generateStorageKey(originalName);

  try {
    // Ensure the directory exists
    const fullDir = path.join(process.cwd(), storageKey.directory);
    if (!existsSync(fullDir)) {
      mkdirSync(fullDir, { recursive: true });
    }

    const fullPath = path.join(process.cwd(), storageKey.key);

    // Chain the stages: buffer -> content validator -> size limiter -> file
    const source = Readable.from(req.file.buffer);
    const validator = createContentValidator(ALLOWED_TYPES);
    const sizeLimiter = createSizeLimiter(10 * 1024 * 1024);
    const destination = createWriteStream(fullPath);

    await pipeline(source, validator, sizeLimiter, destination);

    res.status(201).json({
      key: storageKey.key,
      originalName,
      size: req.file.size,
    });
  } catch (err: any) {
    // Clean up partial file if pipeline failed mid-write
    const fullPath = path.join(process.cwd(), storageKey.key);
    try { await unlink(fullPath); } catch { /* file may not exist */ }

    res.status(400).json({
      error: err.message || 'Upload failed',
    });
  }
});

A few details worth calling out:

multer.memoryStorage() is intentional. You do not want multer writing to a temp directory that you then have to clean up. Memory storage gives you a Buffer. For files smaller than the default Node memory limit (roughly 1 GB on 64-bit), this is fine. For files over 100 MB, swap to a streaming parser like busboy directly, which gives you file parts as streams instead of buffers. The pipeline stays the same; only the source changes from Readable.from(buffer) to the part stream from busboy.

pipeline handles errors. If the validator rejects mid-stream, pipeline destroys the destination stream automatically. A partial file never reaches the final path intact. The catch block’s unlink is insurance for the edge case where the error fires after the first write but before the stream is destroyed.

No temp files means no cleanup cron job. A surprising number of production servers accumulate gigabytes of partially uploaded files in /tmp because the cleanup logic was never written. The streaming pipeline eliminates the problem.

Stage 6: Post-upload content scanning

The magic-byte check in Stage 2 catches mismatched file types. It does not catch a JPEG that contains a steganographic payload or a PDF that exploits a vulnerability in the rendering library. For those, you need a content scanner after the file is stored.

The practical approach is to integrate ClamAV via the clamscan npm package and run a scan after the pipeline succeeds:

import NodeClam from 'clamscan';

const clamscan = await new NodeClam().init({
  clamdscan: {
    socket: '/var/run/clamav/clamd.sock',
    timeout: 30000,
  },
});

async function scanFile(filePath: string): Promise<boolean> {
  try {
    const { isInfected } = await clamscan.isInfected(filePath);
    return isInfected;
  } catch {
    // clamd not running or unreachable -- decide your failure policy
    // Returning true (infected) is the safe default
    return true;
  }
}

// After pipeline succeeds:
const infected = await scanFile(fullPath);
if (infected) {
  await unlink(fullPath);
  return res.status(400).json({ error: 'File rejected by security scan' });
}

Run clamd as a sidecar in your container or as a separate service. The scan adds 100-500 ms per file, which is fine for upload endpoints that are already async from the client’s perspective. For high-throughput pipelines, defer the scan to a background worker and return a 202 Accepted with a scan status endpoint the client can poll.

What about S3 and cloud storage?

The same six-stage pipeline applies to cloud storage. The only change is the final destination: instead of a WriteStream, you use the AWS SDK’s Upload (which accepts a readable stream):

import { Upload } from '@aws-sdk/lib-storage';
import { S3Client } from '@aws-sdk/client-s3';

const s3 = new S3Client({ region: 'us-east-1' });

async function streamToS3(
  source: Readable,
  bucket: string,
  key: string,
): Promise<void> {
  const upload = new Upload({
    client: s3,
    params: {
      Bucket: bucket,
      Key: key,
      Body: source,
    },
    queueSize: 4,   // concurrent parts
    partSize: 5 * 1024 * 1024, // 5 MB
  });

  await upload.done();
}

The content validator and size limiter streams sit between the source and the S3 upload. If they reject, the S3 upload never starts. No partial objects, no cleanup needed.

For GCS, use @google-cloud/storage’s createWriteStream. For R2, the S3-compatible API means the same @aws-sdk/lib-storage code works with a different endpoint.

Common mistakes and how to catch them

Trusting the file extension. The client sends filename="report.pdf", and your validation checks path.extname(). The file is a ZIP bomb. Use magic bytes. The extension is for UX, not security.

Allowing symlinks in the upload directory. If your storage path is traversable by another process, a symlink attack can redirect a write to any file on the system. Write to a directory that is not world-writable. In containers, use a dedicated volume with noexec and nosuid mount flags.

Not setting a request body size limit. Multer’s fileSize limit applies per file. The total request body can still be larger if the request includes many small fields. Set the Express limit option on the body parser or use a reverse proxy (Nginx, Caddy) to cap the inbound body size at the edge.

Storing files on ephemeral disk without replication. Containers restart, pods reschedule, instances terminate. Files stored on local disk disappear. If you need durability, stream to S3, R2, or GCS from the start. The streaming pipeline is the same; only the target changes.

Forgetting to clean up on validation failure. If the magic-byte validator rejects after the stream has written 10 KB to S3, the S3 multipart upload leaves an incomplete object. The AWS SDK’s Upload class does not clean up automatically on error. Wrap the upload in a try-catch and delete the object if the pipeline fails.

try {
  await pipeline(source, validator, sizeLimiter, s3Upload);
} catch {
  await s3.deleteObject({ Bucket, Key }).catch(() => {});
  throw;
}

The six-stage checklist

Before you ship an endpoint that accepts files, run through this:

  • Protocol limits set on busboy or multer (fileSize, files, fields).
  • Content-type header checked as a hint (reject obviously wrong types fast).
  • Magic-byte validator in the stream (reject mismatched types before storage).
  • Size limiter in the stream (reject oversized files at the stream boundary, not after).
  • Safe storage key (UUID filename, date-partitioned directory, validated extension).
  • Post-upload scan (ClamAV or equivalent for files over a threshold).

Skip any one, and you have a gap. Most production breaches from file uploads exploit the skipped stages, not the ones you got right.

The practical takeaway

File uploads are the most dangerous feature an API can expose because they combine untrusted input, disk I/O, and cross-system data flow. The naive handler that ships in every “build a file upload API” tutorial is unsafe by default. The fix is not a single library. It is a pipeline of streams, each enforcing one invariant early, before the next stage sees the data.

Busboy limits catch the protocol-level abuse. Magic-byte validation catches the renamed executable. Size limiters catch the ZIP bomb before it decompresses. Safe storage keys prevent collision and traversal. Post-upload scanning catches the payloads that slip through.

A well-constructed upload pipeline rejects most bad input in the first 100 bytes of the stream, before the file touches your storage or consumes meaningful CPU. The rest of the pipeline is insurance for the cases the earlier stages miss. Build the pipeline once, reuse it across every endpoint, and audit it when you add a new file type.

Your product manager’s CSV files will arrive safely. The ZIP bombs will not.


A note from Yojji

Building secure data ingestion pipelines is the kind of unglamorous engineering that separates a hobby project from a production system. The six-stage upload pattern in this post mirrors the approach Yojji’s teams take on every engagement: validate early, fail fast, and never trust user-supplied data without verification at the protocol, content, and storage layers. Their senior engineers bring this production mindset to the Node.js applications they build for clients ranging from early-stage products to enterprise platforms.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. They offer full-cycle product development and dedicated team augmentation, specializing in the JavaScript ecosystem, AWS/Azure/GCP infrastructure, and microservices architectures.