Overview and What You Will Learn
Compose is not just for local development. On single-server deployments — a DigitalOcean droplet, an EC2 instance, a bare-metal server — Compose is often the right tool for production. It is simpler than Kubernetes, has zero external dependencies, and when configured correctly, it handles service restarts, resource isolation, and config management reliably.
The difference between a toy Compose setup and a production one comes down to four things: restart policies, resource limits, environment separation, and avoiding secrets in compose files. This lab covers all four.
By the end of this lab you will:
- Configure restart policies that match your service's failure behaviour
- Set CPU and memory limits to prevent one container from starving others
- Separate environment config across dev, staging, and production using override files
- Use
.envfiles and environment variable substitution correctly - Apply logging config so containers do not fill the host disk
- Use
depends_onwith health checks to enforce correct startup ordering
Why This Matters in Production
At a small fintech running on a single Ubuntu server, their Compose stack had no restart policies and no memory limits. A memory leak in a background job container consumed all available RAM, causing the main API container to be OOM-killed by the kernel — and since there was no restart policy, it stayed down. The on-call engineer woke up to a full outage that would have been a non-event with two lines of Compose config.
Proper resource limits also prevent noisy-neighbour problems. A reporting service doing a heavy CSV export should not be able to spike CPU and add latency to the live payment API running on the same host.
Core Principles
Restart policy decision tree:
+--------------------------------------------+| Is this a long-running service? || (API, worker, database) |+--------------------------------------------+ | yes | no | +-----------------------------------+ |-------> | Is it a one-shot init job? | | | Use restart: "no" or "on-failure" | | +-----------------------------------+ v+--------------------------------------------+| Can it tolerate brief restarts? || Use restart: "unless-stopped" | <- best default for most services+--------------------------------------------+ | v+--------------------------------------------+| Must restart even after docker daemon || restart? Use restart: "always" | <- databases, critical APIs+--------------------------------------------+Environment separation with override files:
+------------------------+ +------------------------------+| docker-compose.yml | | docker-compose.prod.yml || | | || base service defs | <------> | resource limits, prod env, || ports, volumes, | | restart policies, log config || healthchecks | | no dev mounts or debug ports |+------------------------+ +------------------------------+Detailed Step-by-Step Practical Lab
Milestone 1 — Configure restart policies correctly
## docker-compose.ymlservices: api: image: razorpay-api:latest ## unless-stopped: restarts after crashes, after host reboot ## Does NOT restart if you explicitly run `docker compose stop` restart: unless-stopped postgres: image: postgres:15 ## always: restarts unconditionally, including after `docker compose stop` ## Use for stateful services where data loss risk outweighs the annoyance restart: always db-migration: image: razorpay-api:latest command: npm run migrate ## one-shot jobs should never restart on success ## on-failure restarts if exit code is non-zero (migration failed) restart: on-failureMilestone 2 — Set CPU and memory resource limits
services: api: image: razorpay-api:latest deploy: resources: limits: ## Container cannot use more than 1 full CPU core cpus: "1.0" ## Container is OOM-killed if it exceeds 512MB memory: 512M reservations: ## Docker scheduler guarantees at least these resources are available cpus: "0.25" memory: 128M reporting-worker: image: razorpay-reporting:latest deploy: resources: limits: ## Reporting is batch work — throttle it so it cannot spike the API cpus: "0.5" memory: 1GREMEMBER THIS**Remember:** The `deploy.resources` key works with `docker compose up` when not using Swarm mode, but only in Compose file version 3+. Verify limits are applied with `docker stats`.
Milestone 3 — Separate dev and prod config with override files
## docker-compose.yml (base, committed to git)services: api: image: razorpay-api:${IMAGE_TAG:-latest} environment: - NODE_ENV=${NODE_ENV} - DATABASE_URL=${DATABASE_URL} healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 10s timeout: 5s retries: 3## docker-compose.prod.yml (prod overrides, committed to git)services: api: restart: unless-stopped deploy: resources: limits: cpus: "1.0" memory: 512M logging: driver: "json-file" options: max-size: "50m" max-file: "5" ## No debug port mappings in prod ports: - "3000:3000"## docker-compose.dev.yml (dev overrides, committed to git)services: api: ## Mount source code for hot reload in dev only volumes: - ./src:/app/src ## Expose debug port in dev only ports: - "3000:3000" - "9229:9229" environment: - NODE_ENV=development - DEBUG=true## Local devdocker compose -f docker-compose.yml -f docker-compose.dev.yml up ## Production serverdocker compose -f docker-compose.yml -f docker-compose.prod.yml up -dMilestone 4 — Use .env files for environment config
## .env.production (never commit this file — add to .gitignore)NODE_ENV=productionIMAGE_TAG=v2.4.1DATABASE_URL=postgres://api_user:xK9mP2@10.0.1.50:5432/razorpay_prodREDIS_URL=redis://10.0.1.51:6379## Load a specific .env file (not the default .env)docker compose --env-file .env.production \ -f docker-compose.yml \ -f docker-compose.prod.yml \ up -dCOMMON MISTAKE / WARNING**Common Mistake:** Putting secrets directly in `docker-compose.yml` under `environment:` and committing the file. Anyone with repo access gets the credentials. Use `.env` files that are gitignored, or Docker Secrets for sensitive values.
Milestone 5 — Configure logging to prevent disk exhaustion
services: api: logging: ## json-file is the default driver — always set rotation limits driver: "json-file" options: ## Each log file capped at 50MB max-size: "50m" ## Keep only last 5 rotated files = max 250MB per service max-file: "5" nginx: logging: driver: "json-file" options: max-size: "20m" max-file: "3"Milestone 6 — Enforce startup ordering with health checks
services: postgres: image: postgres:15 healthcheck: test: ["CMD-SHELL", "pg_isready -U api_user"] interval: 5s timeout: 3s retries: 10 api: image: razorpay-api:latest depends_on: postgres: ## API container only starts after postgres reports healthy ## Without condition: service_healthy, depends_on only waits for ## the container to START, not for postgres to accept connections condition: service_healthy redis: condition: service_healthyProduction Best Practices and Common Pitfalls
| Scenario | Wrong | Correct |
|---|---|---|
| Restart policy | No restart policy set | restart: unless-stopped for services, on-failure for jobs |
| Resource limits | No limits set | Always set deploy.resources.limits per service |
| Environment config | Secrets hardcoded in compose file | Gitignored .env files or Docker Secrets |
| Log rotation | Default logging with no limits | json-file driver with max-size and max-file set |
| Service ordering | depends_on: [postgres] alone |
depends_on: postgres: condition: service_healthy |
| Dev vs prod | One compose file for everything | Base file plus environment-specific override files |
Quick Reference and Troubleshooting Commands
| Task | Command |
|---|---|
| Start with prod overrides | docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d |
| Load specific env file | docker compose --env-file .env.production up -d |
| Check resource usage live | docker stats |
| View merged config | docker compose -f base.yml -f prod.yml config |
| Check restart policy | docker inspect <container> --format '{{.HostConfig.RestartPolicy}}' |
| View log driver config | docker inspect <container> --format '{{.HostConfig.LogConfig}}' |
PLACEMENT PRO TIP**Tip:** Run `docker compose config` with all your `-f` flags before deploying to production. It merges and validates the full resolved config, catching variable substitution errors and typos before they cause a failed deployment.
COMMON MISTAKE / WARNING**Security:** Never use `restart: always` on a service that has a known startup bug — it will enter a crash-restart loop that hammers your database with connection attempts. Fix the bug first, then add the restart policy.