Docker Compose Production Patterns — Restart Policies, Resource Limits, and Env Config | DevOps Network

Q: What is the career path for learning Docker Compose Production Patterns — Restart Policies, Resource Limits, and Env Config?

Mastering Docker Compose Production Patterns — Restart Policies, Resource Limits, and Env Config enables engineering opportunities in DevOps, SRE, and cloud platform automation.

Q: How long does it take to learn Docker Compose Production Patterns — Restart Policies, Resource Limits, and Env Config?

Most students gain core proficiency in Docker Compose Production Patterns — Restart Policies, Resource Limits, and Env Config in 2–3 weeks of active hands-on labs.

Overview and What You Will Learn

Compose is not just for local development. On single-server deployments — a DigitalOcean droplet, an EC2 instance, a bare-metal server — Compose is often the right tool for production. It is simpler than Kubernetes, has zero external dependencies, and when configured correctly, it handles service restarts, resource isolation, and config management reliably.

The difference between a toy Compose setup and a production one comes down to four things: restart policies, resource limits, environment separation, and avoiding secrets in compose files. This lab covers all four.

By the end of this lab you will:

Configure restart policies that match your service's failure behaviour
Set CPU and memory limits to prevent one container from starving others
Separate environment config across dev, staging, and production using override files
Use .env files and environment variable substitution correctly
Apply logging config so containers do not fill the host disk
Use depends_on with health checks to enforce correct startup ordering

Why This Matters in Production

At a small fintech running on a single Ubuntu server, their Compose stack had no restart policies and no memory limits. A memory leak in a background job container consumed all available RAM, causing the main API container to be OOM-killed by the kernel — and since there was no restart policy, it stayed down. The on-call engineer woke up to a full outage that would have been a non-event with two lines of Compose config.

Proper resource limits also prevent noisy-neighbour problems. A reporting service doing a heavy CSV export should not be able to spike CPU and add latency to the live payment API running on the same host.

Core Principles

Restart policy decision tree:

Bash

+--------------------------------------------+
| Is this a long-running service?            |
| (API, worker, database)                    |
+--------------------------------------------+
              |
       yes    |    no
              |         +-----------------------------------+
              |-------> | Is it a one-shot init job?        |
              |         | Use restart: "no" or "on-failure" |
              |         +-----------------------------------+
              v
+--------------------------------------------+
| Can it tolerate brief restarts?            |
| Use restart: "unless-stopped"              | <- best default for most services
+--------------------------------------------+
              |
              v
+--------------------------------------------+
| Must restart even after docker daemon      |
| restart? Use restart: "always"             | <- databases, critical APIs
+--------------------------------------------+

Environment separation with override files:

Bash

+------------------------+          +------------------------------+
| docker-compose.yml     |          | docker-compose.prod.yml      |
|                        |          |                              |
| base service defs      | <------> | resource limits, prod env,   |
| ports, volumes,        |          | restart policies, log config |
| healthchecks           |          | no dev mounts or debug ports |
+------------------------+          +------------------------------+

Detailed Step-by-Step Practical Lab

Milestone 1 — Configure restart policies correctly

YAML

## docker-compose.yml
services:
  api:
    image: razorpay-api:latest
    ## unless-stopped: restarts after crashes, after host reboot
    ## Does NOT restart if you explicitly run `docker compose stop`
    restart: unless-stopped
 
  postgres:
    image: postgres:15
    ## always: restarts unconditionally, including after `docker compose stop`
    ## Use for stateful services where data loss risk outweighs the annoyance
    restart: always
 
  db-migration:
    image: razorpay-api:latest
    command: npm run migrate
    ## one-shot jobs should never restart on success
    ## on-failure restarts if exit code is non-zero (migration failed)
    restart: on-failure

Milestone 2 — Set CPU and memory resource limits

YAML

services:
  api:
    image: razorpay-api:latest
    deploy:
      resources:
        limits:
          ## Container cannot use more than 1 full CPU core
          cpus: "1.0"
          ## Container is OOM-killed if it exceeds 512MB
          memory: 512M
        reservations:
          ## Docker scheduler guarantees at least these resources are available
          cpus: "0.25"
          memory: 128M
 
  reporting-worker:
    image: razorpay-reporting:latest
    deploy:
      resources:
        limits:
          ## Reporting is batch work — throttle it so it cannot spike the API
          cpus: "0.5"
          memory: 1G

REMEMBER THIS
**Remember:** The `deploy.resources` key works with `docker compose up` when not using Swarm mode, but only in Compose file version 3+. Verify limits are applied with `docker stats`.

Milestone 3 — Separate dev and prod config with override files

YAML

## docker-compose.yml (base, committed to git)
services:
  api:
    image: razorpay-api:${IMAGE_TAG:-latest}
    environment:
      - NODE_ENV=${NODE_ENV}
      - DATABASE_URL=${DATABASE_URL}
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 5s
      retries: 3

YAML

## docker-compose.prod.yml (prod overrides, committed to git)
services:
  api:
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
    logging:
      driver: "json-file"
      options:
        max-size: "50m"
        max-file: "5"
    ## No debug port mappings in prod
    ports:
      - "3000:3000"

YAML

## docker-compose.dev.yml (dev overrides, committed to git)
services:
  api:
    ## Mount source code for hot reload in dev only
    volumes:
      - ./src:/app/src
    ## Expose debug port in dev only
    ports:
      - "3000:3000"
      - "9229:9229"
    environment:
      - NODE_ENV=development
      - DEBUG=true

Bash

## Local dev
docker compose -f docker-compose.yml -f docker-compose.dev.yml up
 
## Production server
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

Milestone 4 — Use .env files for environment config

Bash

## .env.production (never commit this file — add to .gitignore)
NODE_ENV=production
IMAGE_TAG=v2.4.1
DATABASE_URL=postgres://api_user:xK9mP2@10.0.1.50:5432/razorpay_prod
REDIS_URL=redis://10.0.1.51:6379

Bash

## Load a specific .env file (not the default .env)
docker compose --env-file .env.production \
  -f docker-compose.yml \
  -f docker-compose.prod.yml \
  up -d

COMMON MISTAKE / WARNING
**Common Mistake:** Putting secrets directly in `docker-compose.yml` under `environment:` and committing the file. Anyone with repo access gets the credentials. Use `.env` files that are gitignored, or Docker Secrets for sensitive values.

Milestone 5 — Configure logging to prevent disk exhaustion

YAML

services:
  api:
    logging:
      ## json-file is the default driver — always set rotation limits
      driver: "json-file"
      options:
        ## Each log file capped at 50MB
        max-size: "50m"
        ## Keep only last 5 rotated files = max 250MB per service
        max-file: "5"
 
  nginx:
    logging:
      driver: "json-file"
      options:
        max-size: "20m"
        max-file: "3"

Milestone 6 — Enforce startup ordering with health checks

YAML

services:
  postgres:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U api_user"]
      interval: 5s
      timeout: 3s
      retries: 10
 
  api:
    image: razorpay-api:latest
    depends_on:
      postgres:
        ## API container only starts after postgres reports healthy
        ## Without condition: service_healthy, depends_on only waits for
        ## the container to START, not for postgres to accept connections
        condition: service_healthy
      redis:
        condition: service_healthy

Production Best Practices and Common Pitfalls

Scenario	Wrong	Correct
Restart policy	No restart policy set	`restart: unless-stopped` for services, `on-failure` for jobs
Resource limits	No limits set	Always set `deploy.resources.limits` per service
Environment config	Secrets hardcoded in compose file	Gitignored `.env` files or Docker Secrets
Log rotation	Default logging with no limits	`json-file` driver with `max-size` and `max-file` set
Service ordering	`depends_on: [postgres]` alone	`depends_on: postgres: condition: service_healthy`
Dev vs prod	One compose file for everything	Base file plus environment-specific override files

Quick Reference and Troubleshooting Commands

Task	Command
Start with prod overrides	`docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d`
Load specific env file	`docker compose --env-file .env.production up -d`
Check resource usage live	`docker stats`
View merged config	`docker compose -f base.yml -f prod.yml config`
Check restart policy	`docker inspect <container> --format '{{.HostConfig.RestartPolicy}}'`
View log driver config	`docker inspect <container> --format '{{.HostConfig.LogConfig}}'`

PLACEMENT PRO TIP
**Tip:** Run `docker compose config` with all your `-f` flags before deploying to production. It merges and validates the full resolved config, catching variable substitution errors and typos before they cause a failed deployment.

COMMON MISTAKE / WARNING
**Security:** Never use `restart: always` on a service that has a known startup bug — it will enter a crash-restart loop that hammers your database with connection attempts. Fix the bug first, then add the restart policy.

Docker Compose Production Patterns — Restart Policies, Resource Limits, and Env Config

Overview and What You Will Learn

Why This Matters in Production

Core Principles

Detailed Step-by-Step Practical Lab

Production Best Practices and Common Pitfalls

Quick Reference and Troubleshooting Commands

Resources

Explore More in Docker Compose and Multi-Container Applications

Docker Compose Fundamentals — Defining Multi-Container Applications

Docker Volumes and Persistent Storage — Volumes, Bind Mounts, and tmpfs

Docker Compose for Local Development — Full Stack in One Command

Docker Compose Health Checks and Dependency Ordering