Overview and What You Will Learn
A Bash script that works on your laptop but silently produces wrong results in production is worse than a script that fails loudly. The difference between a junior engineer's script and a senior engineer's script is not complexity — it is reliability. Senior scripts exit immediately on errors, validate their inputs, clean up after themselves, and leave a log trail.
By the end of this lab you will:
- Structure a Bash script with correct shebang, strict mode, and header
- Use variables with proper quoting to avoid word-splitting bugs
- Write conditionals using
[[ ]]with string, file, and numeric tests - Build loops that iterate over files, arrays, and command output
- Write reusable functions with local variables and return values
- Handle errors with
set -euo pipefailandtrapfor cleanup
Why This Matters in Production
Every DevOps engineer eventually owns shell scripts that run in production. A deployment script at Meesho that does not handle errors properly can leave a service in a half-deployed state — old container stopped, new container failed to start, no rollback triggered. Proper error handling means the script either succeeds completely or rolls back cleanly.
Core Principles
Script execution flow with strict mode:
+------------------------------------------+| #!/usr/bin/env bash || set -euo pipefail |+------------------------------------------+ | command runs | exit code != 0? / \ yes no | | v v+------------+ +------------+| set -e | | Continue || Script | | to next || exits NOW | | command |+------------+ +------------+ | v+------------------------------------------+| trap 'cleanup' EXIT || Cleanup function ALWAYS runs || Whether script succeeds or fails |+------------------------------------------+Variable quoting rules — the most common source of bugs:
FILENAME="my file with spaces.txt" Unquoted -- WRONG: cp $FILENAME /tmp/ Shell sees: cp my file with spaces.txt /tmp/ Tries to copy 5 separate files Quoted -- CORRECT: cp "$FILENAME" /tmp/ Shell sees: cp "my file with spaces.txt" /tmp/ Copies one file correctlyDetailed Step-by-Step Practical Lab
Milestone 1 — Script structure and strict mode
## ---------------------------------------------## Script: deploy-payment-api.sh## Purpose: Deploy payment API to production## Usage: ./deploy-payment-api.sh <version>## Author: rahul@devops.in## --------------------------------------------- ## Strict mode -- the three non-negotiable linesset -e ## exit immediately on any errorset -u ## exit on undefined variableset -o pipefail ## pipeline fails if any command fails## Shorthand: set -euo pipefail ## Safer word splitting in for loopsIFS=$'\n\t' ## Script metadatareadonly SCRIPT_NAME=$(basename "$0")readonly SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd)readonly LOG_FILE="/var/log/deploy-$(date +%Y%m%d-%H%M%S).log" ## Logging function (used throughout)log() { local level="$1" shift echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" | tee -a "$LOG_FILE"} ## Usage functionusage() { echo "Usage: $SCRIPT_NAME <version>" echo " version: Docker image tag to deploy (e.g., v1.2.3)" exit 1} ## Validate argumentsif [[ $# -lt 1 ]]; then usagefi VERSION="$1"log "INFO" "Starting deployment of version: $VERSION"Milestone 2 — Variables and quoting
## Variable assignment (no spaces around =)APP_NAME="payment-api"PORT=4000DEPLOY_DIR="/opt/${APP_NAME}" ## Read-only variables (cannot be changed)readonly CONFIG_FILE="/etc/${APP_NAME}/config.yaml" ## Command substitutionCURRENT_VERSION=$(cat /opt/payment-api/VERSION 2>/dev/null || echo "unknown")HOSTNAME=$(hostname -f)TIMESTAMP=$(date +%Y%m%d-%H%M%S) ## String operationsVERSION="v1.2.3"echo "${VERSION#v}" ## Remove prefix v: 1.2.3echo "${VERSION%.*}" ## Remove last segment: v1.2echo "${VERSION^^}" ## Uppercase: V1.2.3echo "${VERSION//./-}" ## Replace dots: v1-2-3 ## Array variablesSERVERS=("10.0.1.50" "10.0.1.51" "10.0.1.52")echo "${SERVERS[0]}" ## first element: 10.0.1.50echo "${SERVERS[@]}" ## all elementsecho "${#SERVERS[@]}" ## count: 3 ## Default valuesENVIRONMENT="${DEPLOY_ENV:-production}"LOG_LEVEL="${LOG_LEVEL:-info}"## If DEPLOY_ENV is unset or empty, ENVIRONMENT = "production" ## Validate required variables: "${VERSION:?VERSION must be set}": "${APP_NAME:?APP_NAME must be set}"## If either is empty/unset, script exits with error messageMilestone 3 — Conditionals with [[ ]]
Always use [[ ]] (double brackets) in bash. It is safer and more powerful than [ ].
## String testsNAME="payment-api" if [[ -z "$NAME" ]]; then echo "NAME is empty"fi if [[ -n "$NAME" ]]; then echo "NAME is set: $NAME"fi if [[ "$NAME" == "payment-api" ]]; then echo "Correct service"fi if [[ "$NAME" == payment-* ]]; then ## glob matching echo "This is a payment service"fi if [[ "$NAME" =~ ^payment ]]; then ## regex matching echo "Starts with payment"fi ## File testsCONFIG="/etc/payment-api/config.yaml" if [[ -f "$CONFIG" ]]; then echo "Config file exists"fi if [[ -d "/opt/payment-api" ]]; then echo "Deploy directory exists"fi if [[ -r "$CONFIG" ]]; then echo "Config is readable"fi if [[ ! -f "$CONFIG" ]]; then echo "Config missing -- cannot deploy" exit 1fi ## Numeric testsMEMORY_FREE=$(free -m | awk '/^Mem:/{print $7}') if [[ $MEMORY_FREE -lt 512 ]]; then log "WARN" "Low memory: ${MEMORY_FREE}MB available"fi ## Combining conditionsif [[ -f "$CONFIG" && -r "$CONFIG" ]]; then echo "Config exists and is readable"fi if [[ "$ENVIRONMENT" == "production" || "$ENVIRONMENT" == "staging" ]]; then echo "Valid environment"fiMilestone 4 — Loops
## Loop over arraySERVERS=("10.0.1.50" "10.0.1.51" "10.0.1.52")for server in "${SERVERS[@]}"; do log "INFO" "Deploying to $server" ssh "rahul@$server" "sudo systemctl restart payment-api"done ## Loop over filesfor config in /etc/payment-api/*.yaml; do if [[ -f "$config" ]]; then log "INFO" "Validating: $config" yamllint "$config" fidone ## Loop over command outputwhile IFS= read -r line; do if [[ "$line" =~ ERROR ]]; then log "ERROR" "Found in log: $line" fidone < /var/log/payment-api/app.log ## C-style loop (counter)for ((i=1; i<=5; i++)); do log "INFO" "Attempt $i of 5" curl -sf http://localhost:4000/health && break sleep $i ## exponential-like backoffdone ## Until loop (runs until condition is true)RETRIES=0until curl -sf http://localhost:4000/health > /dev/null 2>&1; do ((RETRIES++)) if [[ $RETRIES -ge 10 ]]; then log "ERROR" "Service did not start after $RETRIES attempts" exit 1 fi log "INFO" "Waiting for service... (attempt $RETRIES)" sleep 3donelog "INFO" "Service is healthy after $RETRIES attempts"Milestone 5 — Functions
## Basic functioncheck_dependencies() { local deps=("docker" "curl" "jq" "systemctl") local missing=() for dep in "${deps[@]}"; do if ! command -v "$dep" > /dev/null 2>&1; then missing+=("$dep") fi done if [[ ${#missing[@]} -gt 0 ]]; then log "ERROR" "Missing dependencies: ${missing[*]}" return 1 fi log "INFO" "All dependencies present" return 0} ## Function returning a value via echo (command substitution)get_container_id() { local service_name="$1" local container_id container_id=$(docker ps -q --filter "name=${service_name}" 2>/dev/null) echo "$container_id"} ## Function with local variablesdeploy_service() { local service="$1" local version="$2" local image="${service}:${version}" log "INFO" "Pulling image: $image" docker pull "$image" || { log "ERROR" "Failed to pull $image" return 1 } log "INFO" "Stopping old container" docker stop "$service" 2>/dev/null || true log "INFO" "Starting new container" docker run -d --name "$service" --restart unless-stopped "$image" || { log "ERROR" "Failed to start $service" return 1 } log "INFO" "Deploy complete: $service $version"} ## Call functionscheck_dependencies || exit 1deploy_service "payment-api" "$VERSION" || exit 1Milestone 6 — Error handling with trap
## Cleanup function -- always runs on exitTEMP_DIR="" cleanup() { local exit_code=$? log "INFO" "Cleanup starting (exit code: $exit_code)" ## Remove temp files if [[ -n "$TEMP_DIR" && -d "$TEMP_DIR" ]]; then rm -rf "$TEMP_DIR" log "INFO" "Removed temp directory: $TEMP_DIR" fi ## Remove lock file rm -f /var/run/deploy.lock if [[ $exit_code -ne 0 ]]; then log "ERROR" "Deployment FAILED with exit code $exit_code" ## Alert via Slack webhook curl -s -X POST "$SLACK_WEBHOOK" \ -H 'Content-type: application/json' \ -d "{"text": "Deployment failed: payment-api $VERSION"}" \ > /dev/null 2>&1 || true else log "INFO" "Deployment SUCCEEDED" fi} ## Register cleanup to run on any exittrap cleanup EXIT ## Trap specific signalstrap 'log "WARN" "Interrupted by user"; exit 130' INT TERM ## Prevent concurrent execution with lock fileLOCK_FILE="/var/run/deploy.lock"if [[ -f "$LOCK_FILE" ]]; then log "ERROR" "Another deployment is running (lock: $LOCK_FILE)" exit 1fitouch "$LOCK_FILE" ## Create temp directory (will be cleaned up by trap)TEMP_DIR=$(mktemp -d)log "INFO" "Working in: $TEMP_DIR" ## Script continues -- if anything fails, trap runs cleanupProduction Best Practices and Common Pitfalls
| Mistake | Problem | Fix |
|---|---|---|
No set -euo pipefail |
Silent failures continue | Add as first lines after shebang |
| Unquoted variables | Word splitting on spaces | Always quote: "$VAR" |
Using [ ] instead of [[ ]] |
Splitting and glob issues | Use [[ ]] in bash scripts |
| No cleanup on failure | Temp files and locks remain | Register trap cleanup EXIT |
| No input validation | Script fails with confusing errors | Check $# and validate early |
Quick Reference and Troubleshooting Commands
| Task | Command |
|---|---|
| Debug trace | bash -x script.sh |
| Syntax check | bash -n script.sh |
| Lint script | shellcheck script.sh |
| Check exit code | echo $? |
| Strict mode | set -euo pipefail |
| Register cleanup | trap cleanup EXIT |
| Default value | ${VAR:-default} |
| Require variable | ${VAR:?error message} |
PLACEMENT PRO TIP**Tip:** Install and run `shellcheck script.sh` before committing any shell script. It catches dozens of common bugs — unquoted variables, incorrect comparisons, unreachable code — with clear explanations and fix suggestions. It is the `eslint` of the shell world.
REMEMBER THIS**Remember:** `trap cleanup EXIT` fires on every exit — normal completion, `exit 1`, uncaught errors from `set -e`, and signals. It is the most reliable cleanup mechanism because it runs regardless of how the script terminates. Register it early in the script, before any operations that create resources needing cleanup.
COMMON MISTAKE / WARNING**Security:** Never use `eval` with external input. `eval` executes arbitrary code and is the primary shell injection vector. If you find yourself writing `eval $USER_INPUT`, rewrite it using arrays, parameter expansion, or explicit parsing instead.
COMMON MISTAKE / WARNING**Common Mistake:** Writing `if [ $COUNT -eq 0 ]` instead of `if [[ $COUNT -eq 0 ]]`. When `COUNT` is empty, `[ $COUNT -eq 0 ]` becomes `[ -eq 0 ]` which throws a syntax error. `[[ $COUNT -eq 0 ]]` handles empty variables safely. Always use `[[ ]]` in bash.