A 1.2GB Node.js Docker image became 180MB with three changes. Here is exactly what was changed, why it worked, and how to apply the same fixes to any production image.
A Swiggy backend engineer pushed a Node.js service to production. The Docker image was 1.2GB. ECR storage costs were climbing. Every new EKS node took 90 seconds to pull the image before a pod could start. Deployments during traffic spikes were visibly slow.
Three changes later, the image was 180MB. The same application. The same code. Deployments went from 90-second pull times to under 12 seconds.
This is what changed and why.
Before fixing anything, understand what is actually inside a large image. Most engineers assume the application code is the problem. It is almost never the application code.
## Inspect what is taking up space in an imagedocker run --rm \ node:18 \ du -sh /usr/local/lib/node_modules /usr/local/bin \ /usr/lib /usr/shareA default node:18 base image contains the full Node.js runtime, npm, yarn, the Debian package manager, curl, git, a C compiler, Python, and hundreds of system libraries. Almost none of this belongs in a production container. The application code is typically 5-50MB. The base image is 900-1100MB.
Image size breakdown (typical Node.js app):+------------------------------------------+| Base image (node:18-debian) ~950MB || Build tools (npm ci, dev deps) ~180MB || Node.js runtime (needed) ~120MB || Application code ~10MB || Total ~1.2GB |+------------------------------------------+The fix is not smaller application code. The fix is building a minimal image that contains only what the running application needs.
A multi-stage build uses separate Docker stages for building and running. The build stage installs all build tools and compiles the application. The runtime stage starts fresh and copies only the built output. Build tools, dev dependencies, test frameworks — none of it reaches the final image.
## WRONG: Single stage — everything ends up in the imageFROM node:18WORKDIR /appCOPY package*.json ./RUN npm ci ## installs ALL deps including devDependenciesCOPY . .RUN npm run buildCMD ["node", "dist/index.js"]## Result: ~1.1GB image## CORRECT: Multi-stage build## Stage 1: Builder — installs deps and compilesFROM node:18 AS builderWORKDIR /appCOPY package*.json ./RUN npm ci ## all deps needed to buildCOPY . .RUN npm run build ## compile TypeScript, bundle, etc. ## Stage 2: Runtime — only what is needed to runFROM node:18-alpine AS runtimeWORKDIR /app## Copy only production dependenciesCOPY package*.json ./RUN npm ci --omit=dev ## production deps only, no devDependencies## Copy only the built output from the builder stageCOPY --from=builder /app/dist ./distCMD ["node", "dist/index.js"]## Result: ~180MB imageThe COPY --from=builder instruction copies files from the builder stage into the runtime stage. The builder stage is discarded entirely. Every npm package in devDependencies, every TypeScript compiler file, every test framework — gone.
Parameter Breakdown:
AS builder: names the stage so it can be referenced in later stages--from=builder: copies files from the named stage, not from the filesystem--omit=dev: npm flag to skip devDependencies in the production stagenpm ci: installs exactly from package-lock.json — faster and deterministicNode.js has official Alpine variants. Alpine Linux uses musl-libc instead of glibc and busybox instead of GNU coreutils, resulting in a base image of about 5MB versus 175MB for the Debian base.
## Base image size comparison## node:18 ~960MB (Debian Bullseye)## node:18-slim ~240MB (Debian, stripped)## node:18-alpine ~130MB (Alpine with Node.js runtime)## node:18-alpine + app ~ 30MB typical production image FROM node:18-alpine AS runtimeWhen Alpine causes problems:
Some npm packages with native bindings use pre-compiled binaries for glibc. They will fail to start under Alpine's musl-libc with an error like invalid ELF header or exec format error.
## Test Alpine compatibility before committing to itdocker build -t payment-api:alpine-test .docker run --rm payment-api:alpine-test node -e "require('./dist/index')" ## If this fails, try node:18-slim instead of full Alpine## node:18-slim is Debian-based but strips most system packagesThe packages most commonly affected are database drivers (node-gyp compiled binaries), image processing libraries (sharp, canvas), and cryptography modules with native implementations. For the Swiggy order API — a pure TypeScript service with PostgreSQL — Alpine worked without issues.
Without a .dockerignore, the COPY . . instruction sends the entire project directory to the Docker build context, including node_modules, test fixtures, .git, documentation, and local environment files. These inflate the build context and end up in the image if not explicitly excluded.
## .dockerignore — exclude everything that should not be in the imagenode_modules/dist/.git/.gitignore*.md*.log.env.env.*coverage/__tests__/test/.nyc_output/.DS_Storedocker-compose*.ymlMakefileWhy this matters beyond image size:
A missing .dockerignore also causes cache invalidation problems. If the build context contains files that change frequently (logs, coverage reports), Docker invalidates the cache for every COPY . . layer on every build — even when the source code has not changed. With a proper .dockerignore, only meaningful source changes invalidate the cache.
These two changes are security and performance improvements that also reduce image size slightly.
FROM node:18-alpine AS runtimeWORKDIR /app ## Set production mode before npm ci## This skips devDependencies even without --omit=devENV NODE_ENV=production COPY package*.json ./RUN npm ci --omit=dev && npm cache clean --force COPY --from=builder /app/dist ./dist ## Run as non-root user## node:18-alpine includes a 'node' user for exactly this purposeUSER node ## Health check so Kubernetes knows when the container is readyHEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \ CMD wget -qO- http://localhost:4000/health/live || exit 1 EXPOSE 4000CMD ["node", "dist/index.js"]npm cache clean --force after npm ci removes the npm package cache from the image layer. It is not needed at runtime and can add 50-200MB depending on the dependency tree.
## Measure the impact of each fixdocker images | grep payment-api ## REPOSITORY TAG SIZE## payment-api single-stage 1.18GB## payment-api multi-stage 380MB## payment-api alpine 182MB## payment-api final 175MBSize reduction breakdown:
| Change | Size reduction | Reason |
|---|---|---|
| Multi-stage build | 1.18GB → 380MB | Removes build tools and devDependencies |
| Alpine base image | 380MB → 182MB | Smaller base OS and runtime |
| .dockerignore | Negligible on size | Prevents cache invalidation |
| npm cache clean | 182MB → 175MB | Removes post-install cache |
In a CI/CD pipeline, layer caching determines whether your pipeline takes 3 minutes or 12 minutes. Dockerfile layer order is critical — most-stable layers first, most-frequently-changing layers last.
FROM node:18-alpine AS builderWORKDIR /app ## COPY package files FIRST — only changes when dependencies change## Docker caches this layer until package.json or package-lock.json changesCOPY package*.json ./RUN npm ci ## cached unless deps changed ## COPY source code LAST — changes on every commit## Only this layer and below are invalidated on code changesCOPY . .RUN npm run buildIf you reverse the order (COPY . . before COPY package*.json ./), every single commit invalidates the npm ci cache and re-downloads all packages. A 3-minute build becomes a 12-minute build.
In GitHub Actions with the docker/build-push-action:
## Use GitHub Actions cache for Docker layerssteps: - uses: docker/setup-buildx-action@v3 - uses: docker/build-push-action@v5 with: context: . push: true tags: 123456789.dkr.ecr.ap-south-1.amazonaws.com/payment-api:${{ github.sha }} cache-from: type=gha ## read from GitHub Actions cache cache-to: type=gha,mode=max ## write layers to cacheWith layer caching correctly configured, rebuilds that change only application code (not dependencies) complete in 45-90 seconds instead of 4-8 minutes.
Run docker scan or trivy image on every image before pushing to production. Alpine images often have fewer CVEs than Debian images because they ship fewer packages, but this is not guaranteed — verify with scanning.
## Scan for vulnerabilities before pushingtrivy image \ --exit-code 1 \ --severity HIGH,CRITICAL \ payment-api:latest ## Check actual image contentsdocker run --rm payment-api:latest sh -c "apk list --installed"docker history payment-api:latest --no-truncFor teams on AWS ap-south-1, smaller images also reduce ECR cross-AZ transfer costs. A 175MB image pulled 100 times per day across three availability zones costs approximately $1.50/month in data transfer. The same workload with a 1.2GB image costs approximately $10.50/month — a difference that compounds with scale.
INFORMATION📚 **References and Further Reading** * [Docker Official Multi-Stage Build Documentation](https://docs.docker.com/build/building/multi-stage/) — Official reference for multi-stage build syntax * [node:alpine vs node:slim Comparison](https://hub.docker.com/_/node) — Official Node.js Docker image variants and their trade-offs * [Trivy Container Scanner](https://aquasecurity.github.io/trivy/) — Container vulnerability scanning before deployment
Kubernetes pulls images on every new node that schedules a pod. A 1.2GB image takes 45-90 seconds to pull on a cold node versus 8-12 seconds for a 180MB image. During a rollout, if a pod lands on a node that has not cached the image, the pull time directly adds to your deployment latency. In EKS clusters with auto-scaling, new nodes are cold by definition — every scale-out event pays the full pull cost. Smaller images also reduce the ECR egress charges when pulling across availability zones.
They solve different problems and are most effective combined. A multi-stage build removes build-time dependencies — compilers, test frameworks, package managers — from the final image by copying only the built artifact into a clean runtime image. An Alpine base image replaces a 180MB Debian/Ubuntu base with a 5MB musl-libc base. Multi-stage builds typically save 200-800MB by dropping build tools. Alpine saves another 150-175MB on the base layer. Used together they are multiplicative — a Node.js build that would be 1.2GB becomes 30-50MB. The trade-off with Alpine is musl-libc compatibility: some npm packages with native bindings behave differently under musl than glibc. Test Alpine builds thoroughly before deploying to production.
Discussion0