Overview and What You Will Learn
The single biggest problem with Docker images in production is size. A Node.js application that is 2MB of actual code ends up in a 1.2GB Docker image because it includes the Node.js compiler, the entire npm cache, TypeScript, Jest, ESLint, and every development tool the developer installed. In production, none of those tools are needed. They just make the image slower to pull, larger to store, and more dangerous because every package is a potential CVE.
Multi-stage builds solve this completely. You write one Dockerfile that has two phases — a build phase that uses all the development tools to compile your application, and a final phase that copies only the compiled output into a clean minimal image. The build tools never make it into the final image.
By the end of this guide you will be able to write multi-stage Dockerfiles for Node.js, Go, Python, and Java applications that produce production images that are 80-90% smaller than single-stage equivalents.
Why This Matters in Production
At Hotstar, images are pulled to new instances every time the autoscaler adds capacity. A 1.2GB image takes 90 seconds to pull on a typical EC2 instance. A 120MB image takes 9 seconds. During a traffic spike when you need new instances fast, that 81-second difference means users see errors for 90 seconds instead of 9. Image size is a reliability metric.
Core Principles
A multi-stage Dockerfile uses multiple FROM instructions. Each FROM starts a fresh build environment called a stage. You name stages with AS and copy files between them with COPY --from=.
+------------------------------------------+| Stage 1: builder || FROM node:20 AS builder || || Has: Node.js, npm, TypeScript, Jest || Does: installs deps, compiles TypeScript || Output: dist/ folder with JS files || || This stage is DISCARDED in final image |+------------------------------------------+ | COPY --from=builder /app/dist ./dist | v+------------------------------------------+| Stage 2: production (final image) || FROM node:20-alpine || || Has: Node.js runtime only || Gets: only the dist/ from builder || Does NOT have: TypeScript, Jest, npm || || This stage BECOMES the final image |+------------------------------------------+ Single-stage image: 1.2GB (has everything)Multi-stage image: 120MB (has only runtime + output)Size reduction: 90%Detailed Step-by-Step Practical Lab
Milestone 1: Node.js Multi-Stage Build
The most common use case — compile TypeScript in the builder, run plain JavaScript in production.
# ---- Stage 1: Install and Build ----FROM node:20 AS builderWORKDIR /app # Install ALL dependencies including devDependencies (TypeScript, etc.)COPY package.json package-lock.json ./RUN npm ci # Copy source and compile TypeScript to JavaScriptCOPY tsconfig.json ./COPY src/ ./src/RUN npm run build# Output: /app/dist/ contains compiled JavaScript # ---- Stage 2: Production Runtime ----FROM node:20-alpine AS productionWORKDIR /app ENV NODE_ENV=production # Copy ONLY production dependencies (no devDependencies)COPY package.json package-lock.json ./RUN npm ci --omit=dev# This installs only what is listed under "dependencies" not "devDependencies" # Copy ONLY the compiled output from the builder stageCOPY --from=builder /app/dist ./dist # Create non-root userRUN addgroup -S appgroup && adduser -S appuser -G appgroupUSER appuser EXPOSE 8080CMD ["node", "dist/server.js"]# Build and compare sizesdocker build --target builder -t payment-api:builder .docker build -t payment-api:production . docker images | grep payment-api# REPOSITORY TAG SIZE# payment-api builder 1.18GB <- has TypeScript, Jest, all devDeps# payment-api production 118MB <- Node.js runtime + compiled output only# 10x size reductionMilestone 2: Go Multi-Stage Build — The Best Case
Go is where multi-stage builds shine brightest. The Go compiler produces a statically linked binary that needs no runtime at all. The final image can be scratch (completely empty) or distroless.
# ---- Stage 1: Build ----FROM golang:1.21-alpine AS builderWORKDIR /app # Download dependencies first (cached separately)COPY go.mod go.sum ./RUN go mod download # Copy source and buildCOPY . .RUN CGO_ENABLED=0 GOOS=linux go build \ -ldflags="-w -s" \ -o /app/payment-service \ ./cmd/payment-service# CGO_ENABLED=0 = statically linked (no external C libraries needed)# -ldflags="-w -s" = strip debug info (smaller binary) # ---- Stage 2: Minimal Production Image ----FROM scratch AS production# scratch = completely empty image# No OS, no shell, no package manager, no attack surface # Copy only the binaryCOPY --from=builder /app/payment-service /payment-service # Copy SSL certificates (needed for HTTPS calls)COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ EXPOSE 8080ENTRYPOINT ["/payment-service"]docker build -t payment-service:go .docker images | grep payment-service# REPOSITORY TAG SIZE# payment-service go 12MB <- entire image is 12MB including the binary! # Compare to original single-stage:# golang:1.21-alpine base alone = 270MB# With source code and tools = 450MB+Milestone 3: Python Multi-Stage Build
Python needs a bit more care because it has a runtime and compiled wheels:
# ---- Stage 1: Build Python wheels ----FROM python:3.11 AS builderWORKDIR /app # Install build dependencies (gcc, headers for native packages)RUN apt-get update && apt-get install -y \ build-essential \ libpq-dev \ && rm -rf /var/lib/apt/lists/* COPY requirements.txt ./# Build wheels (pre-compiled packages) into a directoryRUN pip install --no-cache-dir --user -r requirements.txt # ---- Stage 2: Production Runtime ----FROM python:3.11-slim AS productionWORKDIR /app # Copy the pre-built Python packages from builderCOPY --from=builder /root/.local /root/.local # Copy application sourceCOPY . . ENV PATH=/root/.local/bin:$PATHENV PYTHONPATH=/app # Non-root userRUN useradd -r -s /bin/false appuserUSER appuser EXPOSE 8000CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]# Size comparison# python:3.11 base + gcc + build tools = 800MB+# python:3.11-slim + pre-built wheels = 180MBMilestone 4: Java/Spring Boot Multi-Stage Build
# ---- Stage 1: Build with Maven ----FROM maven:3.9-eclipse-temurin-21 AS builderWORKDIR /app # Download dependencies first (cache layer)COPY pom.xml ./RUN mvn dependency:go-offline -B # Build the applicationCOPY src/ ./src/RUN mvn package -DskipTests -B# Output: target/payment-service-1.0.0.jar # ---- Stage 2: Extract layers for better caching ----FROM eclipse-temurin:21-jre AS extractorWORKDIR /appCOPY --from=builder /app/target/payment-service-1.0.0.jar app.jar# Spring Boot layertools extracts the jar into layers# Dependencies layer rarely changes = better Docker cachingRUN java -Djarmode=layertools -jar app.jar extract # ---- Stage 3: Production ----FROM eclipse-temurin:21-jre AS productionWORKDIR /app # Copy layers in order of change frequency (least changed first)COPY --from=extractor /app/dependencies/ ./COPY --from=extractor /app/spring-boot-loader/ ./COPY --from=extractor /app/snapshot-dependencies/ ./COPY --from=extractor /app/application/ ./ # Non-root userRUN useradd -r -s /bin/false javauserUSER javauser EXPOSE 8080ENTRYPOINT ["java", "org.springframework.boot.loader.launch.JarLauncher"]# Without multi-stage: maven:3.9 base alone = 750MB# With multi-stage: eclipse-temurin:21-jre + app layers = 280MBMilestone 5: Naming Stages and Selective Targeting
You can name any number of stages and build only a specific one:
FROM node:20 AS baseWORKDIR /appCOPY package.json package-lock.json ./ FROM base AS developmentRUN npm ciCOPY . .CMD ["npm", "run", "dev"] # Hot reloading for development FROM base AS testRUN npm ciCOPY . .CMD ["npm", "test"] # Run test suite FROM base AS builderRUN npm ciCOPY . .RUN npm run build FROM node:20-alpine AS productionWORKDIR /appCOPY --from=builder /app/dist ./distCOPY package.json package-lock.json ./RUN npm ci --omit=devCMD ["node", "dist/server.js"]# Build only to the test stage (run tests in CI)docker build --target test -t payment-api:test .docker run --rm payment-api:test # Build the full production imagedocker build --target production -t payment-api:prod . # Build the development image for local workdocker build --target development -t payment-api:dev .docker run -v $(pwd):/app payment-api:dev# Hot reloading — source changes reflect immediatelyMilestone 6: COPY --from a Registry Image
COPY --from does not have to come from a stage in the same Dockerfile. You can copy from any public image:
FROM alpine:3.18 # Copy curl binary directly from the official curl image# No need to apt-get install curl and add all its depsCOPY --from=curlimages/curl:latest /usr/bin/curl /usr/bin/curl # Copy the AWS CLI from its official imageCOPY --from=amazon/aws-cli:latest /usr/local/bin/aws /usr/local/bin/aws CMD ["sh"]This is useful for pulling specific tools into minimal images without adding a full package manager.
Common Mistakes
| Mistake | Result | Fix |
|---|---|---|
Copying the whole node_modules from builder instead of reinstalling in production stage |
Production image includes devDependencies | Reinstall with npm ci --omit=dev in production stage |
Not naming stages with AS |
Cannot reference stages by name | Always name every stage: FROM node:20 AS builder |
Forgetting CGO_ENABLED=0 in Go builds |
Binary dynamically links against glibc, crashes in scratch/alpine | Always set CGO_ENABLED=0 for Go images targeting scratch or alpine |
Copying . instead of specific build output |
Source code ends up in production image | Only copy what production needs: COPY --from=builder /app/dist ./dist |
| Not cleaning package caches in builder stage | Builder stage is larger than needed | Builder size does not matter — it is discarded anyway |
Troubleshooting Reference
| Problem | Cause | Fix |
|---|---|---|
COPY --from=builder: not found |
Stage name typo | Check exact name: FROM node:20 AS builder matches --from=builder |
Binary crashes with exec format error |
Built for wrong architecture | Add --platform linux/amd64 to the builder FROM |
| SSL errors in scratch image | No CA certificates | COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ |
| Production image missing files | Forgot to copy from builder | Check every file your app needs and add a COPY --from=builder for each |
| Slow builds despite multi-stage | COPY . . invalidates all cache | Move COPY . . after dependency installation in builder stage |
PLACEMENT PRO TIP**Tip:** Use `docker build --target builder -t myapp:debug .` to build only up to the builder stage and inspect it. Run `docker run -it --rm myapp:debug sh` to explore what is inside the builder before deciding what to copy to the final stage.
REMEMBER THIS**Remember:** The `--from` in `COPY --from=builder` refers to the stage name, not a directory name. If you forget to name your stage with `AS builder`, Docker will number them starting from 0 and you can use `COPY --from=0` — but named stages are far more readable and maintainable.
COMMON MISTAKE / WARNING**Common Mistake:** Running `npm install` in the production stage to get production dependencies, but then also copying `node_modules` from the builder. Pick one approach: either copy node_modules from a dedicated deps stage, or run `npm ci --omit=dev` in the production stage. Doing both means running npm install twice and potentially getting inconsistent results.
COMMON MISTAKE / WARNING**Security:** Multi-stage builds significantly reduce your attack surface by removing build tools from the final image. A `FROM scratch` image has zero OS packages — there is literally nothing to exploit except your application binary. This makes the Trivy vulnerability scan report much cleaner and makes your security team much happier.