The Practical Developer

Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

Half the production incidents that start with “but the script said it succeeded” come from the same three missing lines at the top of a bash file. Here is what set -euo pipefail actually does, the traps it has, and the deploy-script pattern that fails loudly instead of quietly succeeding.

A terminal window glowing on a dark desk, the place where a silent bash failure becomes a 3 a.m. page

The deploy script printed Done. and exited 0. The new version was not actually deployed. Your monitoring alert fires four minutes later when the old pods drain and there is nothing to take their place. Somebody opens the postmortem and writes “the script reported success even though the upload failed.” It is the most common bug in shell, and the fix is three characters longer than the broken version.

Bash defaults are not safe. By default, a failed command does not stop the script. By default, an unset variable expands to empty string. By default, a pipeline returns the exit code of only the last command. Every one of those defaults will eventually bite you, and the cost of fixing them is one line at the top of the file.

#!/usr/bin/env bash
set -euo pipefail

That is the line. The rest of this post is what each flag means, the traps that come with them, and the pattern for writing scripts that actually fail when something fails.

What -e, -u, and pipefail actually do

-e (errexit). Exit immediately if any command exits non-zero. Without it, this happens:

#!/usr/bin/env bash
cp build/app.tar.gz s3://my-bucket/   # ← typo: should be aws s3 cp
echo "Upload complete"
exit 0

cp does not understand s3:// URLs and exits with code 1. Without -e, bash keeps going, prints “Upload complete”, and exits 0. Your CI pipeline reports green. Your users get the old build. With -e, the script exits the moment cp fails, and CI goes red.

-u (nounset). Exit if you reference an undefined variable. Without it:

DEPLOY_ENV=prod
rsync -av build/ user@$DEPOY_ENV.example.com:/srv/   # ← typo: DEPOY vs DEPLOY

Bash silently treats $DEPOY_ENV as empty string. The command becomes rsync -av build/ user@.example.com:/srv/. With -u, it fails immediately: DEPOY_ENV: unbound variable. Most “the script ran on staging” mysteries are this exact bug.

-o pipefail. The exit code of a pipeline is the exit code of the last command that failed, not just the last command in the chain. Without it:

curl https://flaky-api/data | jq '.items[]' > out.json

If curl fails (DNS error, 500), it exits non-zero. But jq happily reads its empty stdin, exits 0, and the pipeline reports success. With pipefail, the pipeline exits with curl’s code and -e then kicks in.

The combination is so common that it has a name: “bash strict mode.” It was popularized by Aaron Maxwell’s post in 2014 and has been the right default ever since.

The IFS line you have probably seen too

Some teams add a fourth line:

IFS=$'\n\t'

This narrows the field-splitting separator to newline and tab. Defaults to space + tab + newline, which means filenames with spaces (My Document.pdf) silently split into two arguments. With the narrower IFS:

for f in $(ls /uploads); do
  process "$f"
done

…is less dangerous, but the right answer is still to avoid $(ls) and use globs:

for f in /uploads/*; do
  process "$f"
done

I include IFS=$'\n\t' only in scripts that actually parse newline-separated input. Most scripts do not, and the IFS change can confuse maintainers.

The traps -e has

-e is not magic. There are four cases where it does not stop the script, and forgetting them is how people decide it does not work and stop using it.

Failure inside if / && / || / !. -e is suppressed when the command’s exit code is being inspected for control flow:

if grep "ERROR" log.txt; then    # ← grep failure here is fine, that is the point
  alert
fi
some_command_that_might_fail    # ← this one will trigger -e normally

That is correct behavior. But:

do_step_one && do_step_two       # ← if do_step_one fails, do_step_two is skipped
                                 #   and the script DOES NOT exit. Surprise.

Most developers expect the script to die after a failed &&. It does not. Use plain sequencing:

do_step_one
do_step_two

Failure in command substitution that is assigned to a variable:

result=$(command_that_fails)     # ← bash 4.x: -e does NOT trigger here
echo "result=$result"

This is the bug. The script keeps going with result="". Bash 5+ fixed this when shopt -s inherit_errexit is set:

set -euo pipefail
shopt -s inherit_errexit

If you write deploy scripts and you can rely on bash 5, add this line.

Failure in functions called as part of a chain. A function whose last command fails returns that exit code, but if you call it like func || true you have suppressed the error.

Background processes. cmd & does not trip -e because bash does not wait for it. If you launch background work, you must wait and check $? yourself.

A deploy-script template that fails honestly

Here is a real template I have used, end-to-end. It is annotated so you can see why each line is there.

#!/usr/bin/env bash
# Deploy frontend to staging or prod.

set -euo pipefail
shopt -s inherit_errexit         # bash 5+: catch failures in $(subshells)

# All required env vars listed up front so the script fails immediately
# if anything is missing. Without -u this would silently use empty strings.
: "${ENV:?ENV must be set: staging or prod}"
: "${BUILD_DIR:?BUILD_DIR must be set}"
: "${S3_BUCKET:?S3_BUCKET must be set}"

# Trap to print where we died on any error. Worth its weight in gold during
# 3 a.m. debugging. You get the line number, not just "exit 1".
trap 'echo "ERROR: command failed at ${BASH_SOURCE[0]}:${LINENO}: ${BASH_COMMAND}" >&2' ERR

# Sanity check: are we on a clean tree? Refuse to deploy uncommitted changes.
if [[ -n $(git status --porcelain) ]]; then
  echo "refusing to deploy: uncommitted changes" >&2
  exit 1
fi

# Build outside the deploy step so build failures stop us BEFORE we touch prod.
echo "==> building"
npm ci
npm run build

echo "==> uploading to s3://${S3_BUCKET}/${ENV}/"
aws s3 sync "${BUILD_DIR}/" "s3://${S3_BUCKET}/${ENV}/" \
  --delete \
  --cache-control "public, max-age=31536000, immutable" \
  --exclude "index.html"

# index.html gets a different cache header; separate command on purpose.
aws s3 cp "${BUILD_DIR}/index.html" "s3://${S3_BUCKET}/${ENV}/index.html" \
  --cache-control "public, max-age=60, must-revalidate"

echo "==> invalidating CDN"
aws cloudfront create-invalidation \
  --distribution-id "${CLOUDFRONT_ID}" \
  --paths "/index.html" >/dev/null

echo "==> deploy complete: $(git rev-parse --short HEAD) → ${ENV}"

A few things that are deliberate.

: "${VAR:?message}" is a parameter expansion that aborts if VAR is unset or empty. It is the only one-liner that gives you a nice error and uses -u-style behavior.

The trap '... ERR' line is the difference between a useful error message and exit 1. With it, when something fails you see:

ERROR: command failed at deploy.sh:23: aws s3 sync /build/ s3://...

You know which line, which command, and which file. Without it, you know only that something exited 1.

aws s3 sync and the index.html copy are split because they need different Cache-Control headers. People often combine them and then wonder why a deploy never goes live, because the browser is happily serving an index.html cached for a year.

Test it before you trust it

A strict-mode script lulls you into thinking it cannot fail silently. It still can if you do not exercise the failure paths. The smallest thing you can do is run the script with a deliberate fault injection:

# Force the upload to fail. Does the script actually stop?
S3_BUCKET=does-not-exist ENV=staging BUILD_DIR=./dist ./deploy.sh
echo "exit code: $?"

If the exit code is 0, you have a bug somewhere, probably an || true that should not be there, or a piped command without pipefail.

I also like running shellcheck in CI:

- name: shellcheck
  run: shellcheck deploy.sh

Shellcheck will flag the common foot-guns automatically: unquoted expansions, $(ls), [ ] vs [[ ]], missing set -e. It pays for itself the first time it catches a typo before it gets to staging.

When to stop using bash

Strict mode does not turn bash into a real programming language. It just stops the script from lying to you. Once a script gets past ~150 lines, has loops with branching logic, or talks to two or more external systems, write it in Python or Go. The reason is not bash performance; it is that a bug in 200 lines of bash takes longer to find than the same bug in 200 lines of Python.

A useful rule: if you are writing a function that takes more than three arguments, or you are using eval, or you are doing arithmetic with bc, you have outgrown bash. Move it.

The takeaway

Three characters at the top of every deploy script (-e, -u, pipefail) are the difference between “script reported success” and “script actually succeeded.” Add inherit_errexit if you can rely on bash 5. Add a trap '... ERR' so failures point at the line that broke. Validate required env vars at the top with ${VAR:?...}. Lint the script with shellcheck.

It is a small ritual. It is also the difference between a deploy you can leave running while you eat lunch and a deploy that requires a postmortem.


A note from Yojji

The unglamorous discipline of “your deploy script must fail loudly when something is wrong” is one of those things that does not show up on a roadmap and decides whether your team is paged at 3 a.m. or sleeping. It is the kind of detail Yojji’s engineers have built into the deployment pipelines they run for clients across Europe, the US, and the UK.

Yojji is an international custom software development company founded in 2016, focused on the JavaScript stack (React, Node.js, TypeScript) and cloud platforms (AWS, Azure, GCP). Their full-cycle teams cover discovery, design, development, QA, and DevOps, including the boring-but-critical scripts and pipelines that separate “we shipped” from “we shipped, and we know it shipped.”