Overview and What You Will Learn
Most engineers write their first Dockerfile by copying an example from the internet and making it work. The result is a 2GB image that takes 8 minutes to build, runs as root, and includes your entire node_modules directory in every layer. It works — but it is slow, large, and insecure.
In this guide you will learn how to write Dockerfiles that build in under 60 seconds, produce images under 200MB, and run securely as a non-root user. You will understand the exact mechanics of layer caching — so you know why moving one instruction changes your build from 45 seconds to 8 minutes — and how to structure every Dockerfile for maximum cache reuse.
Why This Matters in Production
At Hotstar, 50+ Docker images are built on every code push. A poorly cached Dockerfile that takes 8 minutes to build adds 400+ minutes of developer wait time per day across the team. An image that is 2GB instead of 150MB adds 1850MB of pull time before every deployment. These are not minor inconveniences — they are engineering productivity costs that compound daily.
Core Principles
Every Dockerfile instruction creates a layer. Layers are cached. Understanding cache invalidation is the most important skill for writing fast Dockerfiles.
+------------------------------------------+| FROM node:20-alpine | <- Layer 1: base image+------------------------------------------+| WORKDIR /app | <- Layer 2: set working dir+------------------------------------------+| COPY package.json package-lock.json ./ | <- Layer 3: just dependency files+------------------------------------------+| RUN npm install | <- Layer 4: install deps (CACHED)+------------------------------------------+| COPY . . | <- Layer 5: your source code+------------------------------------------+| RUN npm run build | <- Layer 6: build your app+------------------------------------------+ Cache invalidation rule:If a layer changes, ALL layers below it are invalidated. If you change your source code (Layer 5 changes): * Layer 1: cache HIT (FROM unchanged) * Layer 2: cache HIT (WORKDIR unchanged) * Layer 3: cache HIT (package.json unchanged) * Layer 4: cache HIT (npm install output unchanged - this is the slow step!) * Layer 5: cache MISS (source changed) * Layer 6: cache MISS (must rebuild) If you put COPY . . before npm install: * Every source file change invalidates npm install * Every build takes 3-5 minutes instead of 45 secondsDetailed Step-by-Step Practical Lab
Milestone 1: Dockerfile Instruction Reference
Every Dockerfile instruction and when to use it:
# FROM — The base image. Always pin to a specific version, never use :latestFROM node:20-alpine# Good: node:20-alpine, node:20-alpine3.18, node:20.10.0-alpine3.18# Bad: node:latest, node:alpine (alpine tag changes over time) # WORKDIR — Set the working directory inside the containerWORKDIR /app# Creates the directory if it does not exist# All subsequent RUN, COPY, CMD use this as the base path# Always use an absolute path # COPY — Copy files from build context into the imageCOPY package.json package-lock.json ./# Copies specific files — cache-friendlyCOPY src/ ./src/# Copies a directoryCOPY . .# Copies everything not in .dockerignore — put this LAST # ADD — Like COPY but also extracts tar files and allows URLsADD https://example.com/file.tar.gz /tmp/# Avoid ADD unless you specifically need tar extraction or URL fetching# COPY is more explicit and predictable # RUN — Execute a command during the buildRUN npm install# Each RUN creates a new layer# Chain commands to avoid extra layers:RUN apt-get update && \ apt-get install -y curl git && \ rm -rf /var/lib/apt/lists/*# Critical: clean apt cache in the same RUN instruction# If you clean in a separate RUN, the cache is still in the previous layer # ENV — Set environment variables baked into the imageENV NODE_ENV=productionENV PORT=8080# These are available at both build time and runtime# Do not use ENV for secrets — they are visible in docker history # ARG — Build-time variable (not available at runtime)ARG BUILD_VERSION=unknown# Pass at build time: docker build --build-arg BUILD_VERSION=v1.0.0 .# Good for: version numbers, build metadata# Not for: secrets (they appear in docker history) # EXPOSE — Documents which ports the container listens onEXPOSE 8080# Does NOT actually publish the port# Is purely documentation — helpful for engineers reading the Dockerfile# You still need -p 8080:8080 in docker run to publish it # CMD — The default command when the container startsCMD ["node", "dist/server.js"]# Use exec form (JSON array) — NOT shell form# Shell form: CMD node dist/server.js# Exec form: CMD ["node", "dist/server.js"]# Why exec form: receives signals directly, SIGTERM works for graceful shutdown # ENTRYPOINT — The fixed executable the container always runsENTRYPOINT ["node"]CMD ["dist/server.js"]# With both: ENTRYPOINT is the binary, CMD is the default argument# docker run myimage -> runs: node dist/server.js# docker run myimage dist/other.js -> runs: node dist/other.js# Entrypoint is overridable with --entrypoint flag # USER — Run the container as a non-root userUSER node# Set this after installing dependencies (which often need root)# Before CMD/ENTRYPOINT — the application runs as this user # HEALTHCHECK — Tell Docker how to check if the container is healthyHEALTHCHECK --interval=30s --timeout=3s --retries=3 \ CMD curl -f http://localhost:8080/health || exit 1 # LABEL — Add metadata to the imageLABEL maintainer="platform@razorpay.com"LABEL version="v3.1.0"LABEL description="Payment API service"Milestone 2: Layer Caching — The Most Important Concept
# BAD Dockerfile — every code change rebuilds npm installFROM node:20-alpineWORKDIR /appCOPY . . # Copies everything including source codeRUN npm install # Cache busted every time ANY file changesRUN npm run buildEXPOSE 8080CMD ["node", "dist/server.js"] # GOOD Dockerfile — npm install is cached unless package.json changesFROM node:20-alpineWORKDIR /appCOPY package.json package-lock.json ./ # Copy ONLY dependency files firstRUN npm install # This layer is cached until package.json changesCOPY . . # Copy source code AFTER installing depsRUN npm run buildEXPOSE 8080CMD ["node", "dist/server.js"] # Time difference on a typical Node.js app:# BAD: every build = 3-5 minutes (npm install runs every time)# GOOD: code-only changes = 20-30 seconds (npm install cached)# dependency changes = 3-5 minutes (expected — new packages)Cache invalidation rules:
# Rule 1: If the instruction text changes, cache is busted# "RUN npm install" -> "RUN npm install --verbose" = cache bust # Rule 2: For COPY/ADD, if any copied file changes, cache is busted# COPY package.json ./ -> if package.json changes, cache busted# COPY . . -> if ANY file changes, cache busted # Rule 3: Everything below a cache miss is also a miss# Layer 3 busts -> Layer 4, 5, 6 all rebuild regardless of their content # Check your cache hit rate during buildsdocker build --progress=plain .# step 3/8 : COPY package.json package-lock.json ./# ---> Using cache <- cache HIT# step 4/8 : RUN npm install# ---> Using cache <- cache HIT (deps unchanged)# step 5/8 : COPY . .# ---> a84f9c2b1d3e <- cache MISS (source changed)# step 6/8 : RUN npm run build# ---> Running in b72c8a9f4e1d <- REBUILDING (cache miss cascade)Milestone 3: The .dockerignore File
Without .dockerignore, COPY . . sends your entire project directory to the Docker daemon — including node_modules (1GB+), .git directory (hundreds of MB), build artifacts, and secrets.
# .dockerignore — put this in the same directory as your Dockerfile # Node.jsnode_modules/ # Never copy node_modules — install inside containernpm-debug.log.npm # Build output (usually copied from a build stage instead)dist/build/.next/out/ # Version control.git/.gitignore # Environment files — NEVER copy into images.env.env.*!.env.example # Allow the example file (it has no real secrets) # Development tools.vscode/.idea/*.swp # Testingcoverage/.nyc_output__tests__/*.test.ts*.spec.ts # Documentationdocs/*.md!README.md # Allow README if your image serves docs # Docker files themselvesDockerfileDockerfile.*docker-compose.ymldocker-compose.*.yml # macOS.DS_Store# Measure the impact of .dockerignore# Before adding .dockerignore:docker build .# Sending build context to Docker daemon 890.5MB <- 890MB sent # After adding .dockerignore:docker build .# Sending build context to Docker daemon 2.3MB <- 2.3MB sent # 390x reduction in build context = dramatically faster buildsMilestone 4: Choosing the Right Base Image
The base image is the single biggest factor in image size and security.
# Compare sizes for Node.js base images:docker pull node:20 # ~1.1GB — full Debian with build toolsdocker pull node:20-slim # ~220MB — Debian without build toolsdocker pull node:20-alpine # ~55MB — Alpine Linux (musl libc) docker images | grep node# node 20 sha256:... 1.1GB# node 20-slim sha256:... 220MB# node 20-alpine sha256:... 55MB # Compare CVE counts (run with Trivy):trivy image node:20 # typically 200+ CVEstrivy image node:20-alpine # typically 10-20 CVEsChoosing which to use:
Use node:20-alpine when: * Building final production images (smallest, fewest CVEs) * Your application has no native module dependencies * You can test that Alpine's musl libc works with your dependencies Use node:20-slim when: * Your app uses native modules that require glibc (not compatible with Alpine musl) * You need Debian tools but want a smaller image than full node:20 Use node:20 (full Debian) when: * You need native compilation tools (node-gyp, Python, gcc) * Usually only in the BUILD STAGE of a multi-stage build, not the final stage Use distroless (gcr.io/distroless/nodejs20-debian12) when: * Maximum security: no shell, no package manager, no OS utilities * Only the runtime and your app binary * Smallest attack surface possibleMilestone 5: CMD vs ENTRYPOINT — Getting It Right
This is one of the most misunderstood parts of Dockerfiles:
# ENTRYPOINT — the fixed executable, always runs# CMD — the default arguments to ENTRYPOINT, can be overridden # Pattern 1: CMD only (most common for apps)CMD ["node", "dist/server.js"]# docker run myimage -> node dist/server.js# docker run myimage bash -> bash (CMD overridden) # Pattern 2: ENTRYPOINT + CMD (good for tools)ENTRYPOINT ["node"]CMD ["dist/server.js"]# docker run myimage -> node dist/server.js# docker run myimage dist/other.js -> node dist/other.js (CMD overridden)# docker run --entrypoint bash myimage -> bash (ENTRYPOINT overridden) # Pattern 3: ENTRYPOINT only (strict tools)ENTRYPOINT ["node", "dist/server.js"]# docker run myimage -> node dist/server.js# Arguments passed to docker run are APPENDED (not common use case) # ALWAYS use exec form (JSON array) not shell form:# Shell form: CMD node dist/server.js# -> runs as: /bin/sh -c "node dist/server.js"# -> node is a grandchild of sh, not PID 1# -> SIGTERM goes to sh, not to node — graceful shutdown breaks # Exec form: CMD ["node", "dist/server.js"]# -> runs as: node dist/server.js directly# -> node is PID 1 inside the container# -> SIGTERM goes directly to node — graceful shutdown worksMilestone 6: A Complete Production-Ready Dockerfile
Combining everything:
# ---- Base Stage ----FROM node:20-alpine AS baseWORKDIR /app# Install only production OS dependenciesRUN apk add --no-cache tini# tini is a minimal init system — handles zombie processes and signal forwarding # ---- Dependencies Stage ----FROM base AS deps# Copy only what is needed for npm installCOPY package.json package-lock.json ./# Install production dependencies onlyRUN npm ci --omit=dev# npm ci is faster and more deterministic than npm install in CI/production # ---- Build Stage ----FROM base AS builder# Copy all dependencies (including devDependencies for build tools)COPY package.json package-lock.json ./RUN npm ciCOPY . .RUN npm run build # ---- Production Stage ----FROM base AS production# Set production environmentENV NODE_ENV=productionENV PORT=8080 # Copy only the production node_modules (no devDependencies)COPY --from=deps /app/node_modules ./node_modules # Copy only the build output (not source code)COPY --from=builder /app/dist ./dist # Create a non-root user and switch to itRUN addgroup -S appgroup && \ adduser -S appuser -G appgroup && \ chown -R appuser:appgroup /appUSER appuser # Document the portEXPOSE 8080 # Health checkHEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1 # Use tini as the init system, then start the appENTRYPOINT ["/sbin/tini", "--"]CMD ["node", "dist/server.js"] # Labels for traceabilityLABEL maintainer="platform@swiggy.com"LABEL version="production"Common Mistakes
| Mistake | Cost | Fix |
|---|---|---|
COPY . . before RUN npm install |
Every code change rebuilds dependencies | Copy package.json first, then run install, then copy source |
No .dockerignore file |
Sends gigabytes to daemon on every build | Always create .dockerignore as the first file in a new project |
Using latest tag for base image |
Builds break randomly when base image updates | Pin to specific version: node:20.10.0-alpine3.18 |
| Running as root | Security vulnerability | Always add USER instruction before CMD |
| Shell form for CMD | Graceful shutdown with SIGTERM breaks | Use exec form: CMD ["node", "server.js"] |
| Installing dev dependencies in production | 10-100x larger image | Use npm ci --omit=dev or multi-stage builds |
| Cleaning apt cache in separate RUN | Cache still in previous layer | Clean in same RUN: apt-get install && rm -rf /var/lib/apt/lists/* |
Troubleshooting Reference
| Problem | Symptom | Fix |
|---|---|---|
| Slow builds even with unchanged code | npm install running every build |
Move COPY package.json before COPY . . |
| Large image size | Image over 1GB for a simple app | Check docker history image for large layers, use multi-stage builds |
| Container crashes immediately | Exit code 127 | CMD binary does not exist in the image — check the path |
| SIGTERM not handled | Container takes 10 seconds to stop (timeout) | Use exec form for CMD, add signal handler in your application |
| Build context too large | Sending build context... 890MB |
Add .dockerignore file to exclude node_modules, .git, build artifacts |
PLACEMENT PRO TIP**Tip:** Use `docker build --progress=plain .` to see the detailed build output including exactly which layers are cache hits and which are rebuilding. This is the fastest way to understand why your build is slower than expected.
REMEMBER THIS**Remember:** Every `RUN apt-get install` must clean the package manager cache in the same instruction: `RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*`. If you clean in a separate RUN instruction, the cache bytes are still stored in the previous layer and the cleanup has no effect on image size.
COMMON MISTAKE / WARNING**Common Mistake:** Using `ADD` instead of `COPY` for copying local files. `ADD` has two extra behaviours — extracting tar archives and fetching URLs — that make it unpredictable when you just want to copy files. Always use `COPY` for local files unless you specifically need tar extraction. Save `ADD` for the rare case where you genuinely want auto-extraction.
COMMON MISTAKE / WARNING**Security:** Never use `ENV` to set secrets in a Dockerfile. Environment variables set with `ENV` are baked into the image and visible to anyone who runs `docker inspect` or `docker history` on the image. Use BuildKit secret mounts (`--mount=type=secret`) for build-time secrets, and pass runtime secrets through environment variables at `docker run` time from a secure secret store like AWS Secrets Manager.