What is the career path for learning Docker Compose Health Checks and Dependency Ordering?

Mastering Docker Compose Health Checks and Dependency Ordering enables engineering opportunities in DevOps, SRE, and cloud platform automation.

Docker Compose Health Checks and Dependency Ordering | DevOps Network

Q: How long does it take to learn Docker Compose Health Checks and Dependency Ordering?

Most students gain core proficiency in Docker Compose Health Checks and Dependency Ordering in 2–3 weeks of active hands-on labs.

Overview and What You Will Learn

A Compose stack with depends_on alone solves only half the problem. Docker starts the containers in the right order, but "started" and "ready to accept connections" are not the same thing. PostgreSQL's container process can be running in under a second, while the database itself takes another 3-4 seconds to finish initialising and accept connections. An API container that starts the instant Postgres' container starts will crash on its first connection attempt.

This lab fixes that gap permanently using health checks combined with condition: service_healthy.

By the end of this lab you will:

Understand why plain depends_on is not enough to guarantee readiness
Write HEALTHCHECK instructions for HTTP, TCP, database, and Redis-based services
Configure depends_on with condition: service_healthy in Compose
Read and interpret container health states (starting, healthy, unhealthy)
Debug a failing health check step by step
Know when wait-for-it.sh style scripts are still useful

Why This Matters in Production

At PhonePe, a payments API container that starts before its Redis-backed rate limiter is ready will either crash on boot or silently skip rate limiting for its first few seconds of traffic — both are unacceptable outcomes for a payments system. The fix is not a sleep statement in the entrypoint script. It is a properly defined health check that Compose (or Kubernetes, later) can rely on as a contract: "this service does not get traffic until it reports healthy."

This same health check definition becomes the foundation for Kubernetes readiness probes later, so getting it right in Compose pays off twice.

Core Principles

Why depends_on alone is insufficient:

◈ DIAGRAM

+------------------------+          +------------------------------+
|    postgres container  |          |        api container          |
|                        |          |                              |
|  process starts (0.3s)  | -------> |  starts immediately (0.3s)    |
|  still initialising      |          |  tries to connect -> FAILS    |
+------------------------+          +------------------------------+

With a health check gating the dependency:

◈ DIAGRAM

+------------------------------------------+
| postgres container starts                |
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| status: starting (health check not yet   | <- pg_isready fails,
| passed, retry in 5s)                     |    container stays 'starting'
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| pg_isready succeeds, status: healthy     |
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| api container is released to start       | <- depends_on:
+------------------------------------------+    condition: service_healthy

Health check states:

◈ DIAGRAM

+------------------------+   +------------------------+   +------------------------+
|       starting          |   |         healthy         |   |        unhealthy        |
|                        |   |                          |   |                        |
| within start_period       |   | check passed N times in a  |   | check failed retries    |
| failures do not count     |   | row (default: 1)            |   | times in a row            |
+------------------------+   +------------------------+   +------------------------+

Detailed Step-by-Step Practical Lab

Milestone 1 — Write a HEALTHCHECK in the Dockerfile

Dockerfile

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
 
# Health check hits the app's own /health endpoint
# interval: how often to check
# timeout: how long to wait for a response
# start_period: grace time before failures count against the container
# retries: consecutive failures needed to mark unhealthy
HEALTHCHECK --interval=10s --timeout=3s --start-period=15s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:4000/health || exit 1
 
CMD ["node", "index.js"]

Bash

## Build and confirm the health check is registered
docker build -t phonepe-rate-limiter-api .
docker run -d --name rate-limiter-api -p 4000:4000 phonepe-rate-limiter-api
 
## Watch the health status transition from starting to healthy
watch -n 1 'docker inspect --format "{{.State.Health.Status}}" rate-limiter-api'

Milestone 2 — Define health checks for common dependency types in Compose

YAML

services:
  postgres:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U phonepe_user -d ratelimits"]
      interval: 5s
      timeout: 3s
      retries: 5
      start_period: 10s
 
  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5
 
  rabbitmq:
    image: rabbitmq:3-management-alpine
    healthcheck:
      test: ["CMD", "rabbitmq-diagnostics", "check_port_connectivity"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 20s
 
  internal-api:
    image: phonepe-rate-limiter-api:latest
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:4000/health"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 15s

Milestone 3 — Gate startup with condition: service_healthy

YAML

services:
  api:
    build: .
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
      rabbitmq:
        condition: service_healthy
    environment:
      DATABASE_URL: postgres://phonepe_user:secret@postgres:5432/ratelimits
      REDIS_URL: redis://redis:6379
 
  postgres:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U phonepe_user"]
      interval: 5s
      timeout: 3s
      retries: 5
 
  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      retries: 5
 
  rabbitmq:
    image: rabbitmq:3-management-alpine
    healthcheck:
      test: ["CMD", "rabbitmq-diagnostics", "check_port_connectivity"]
      interval: 10s
      retries: 5

Bash

## Start the stack and watch dependency ordering in action
docker compose up
## api container will NOT start until postgres, redis, and
## rabbitmq all report status: healthy

Milestone 4 — Read and debug health check status

Bash

## See health status for all services
docker compose ps

TEXT

NAME                  STATUS
phonepe-postgres-1    Up 12 seconds (healthy)
phonepe-redis-1       Up 12 seconds (healthy)
phonepe-rabbitmq-1    Up 12 seconds (health: starting)
phonepe-api-1         Created

Bash

## Inspect detailed health check history for one container
docker inspect phonepe-rabbitmq-1 --format '{{json .State.Health}}' | python3 -m json.tool
 
## See the last few health check attempts and their output
docker inspect phonepe-rabbitmq-1 --format '{{json .State.Health.Log}}' | python3 -m json.tool
 
## Manually run the exact health check command for debugging
docker exec phonepe-rabbitmq-1 rabbitmq-diagnostics check_port_connectivity

Milestone 5 — Diagnose a stuck or failing health check

Bash

## If a container stays 'starting' past start_period, check:
 
## 1. Is the command available inside the container at all?
docker exec phonepe-api-1 which wget
 
## 2. Is the app actually listening on the expected port yet?
docker exec phonepe-api-1 netstat -tlnp
 
## 3. Run the health check command manually with verbose output
docker exec phonepe-api-1 wget --no-verbose --tries=1 --spider http://localhost:4000/health
 
## 4. Check application logs for startup errors
docker compose logs api --tail=50

Milestone 6 — When wait-for-it.sh is still useful

Health checks solve container-to-container ordering inside Compose. But sometimes you need to wait for a dependency from outside Compose entirely, for example in a CI script before running tests.

Bash

## Download the script once into your repo
curl -o wait-for-it.sh https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh
chmod +x wait-for-it.sh
 
## Use it in a CI step before running integration tests
./wait-for-it.sh localhost:5432 --timeout=30 -- npm run test:integration

REMEMBER THIS
**Remember:** `wait-for-it.sh` checks if a TCP port is open. A `HEALTHCHECK` checks if the application is actually functional. A database port can be open while the database itself is still recovering from a crash — prefer real health checks over plain port checks whenever possible.

Production Best Practices and Common Pitfalls

Scenario	Wrong	Correct
API needs DB ready	depends_on with no condition	depends_on with condition: service_healthy
Slow-starting service	retries set too low, marked unhealthy too early	Use start_period to give grace time before counting failures
Health check tool missing	CMD curl in an alpine image without curl installed	Use wget (preinstalled) or install curl explicitly
Health check too expensive	Running a full DB query every 2 seconds	Use a lightweight check like pg_isready or redis-cli ping
Debugging a stuck check	Restarting the whole stack repeatedly	docker inspect State.Health.Log to see actual failure output

Quick Reference and Troubleshooting Commands

Task	Command
View health status of all services	`docker compose ps`
Inspect full health detail	`docker inspect name --format '{{json .State.Health}}'`
See health check failure history	`docker inspect name --format '{{json .State.Health.Log}}'`
Run health check command manually	`docker exec name <healthcheck command>`
View startup logs	`docker compose logs -f service_name`
Wait for a TCP port before running a command	`./wait-for-it.sh host:port -- command`

PLACEMENT PRO TIP
**Tip:** Set `start_period` generously for databases (10-20 seconds) and tightly for simple HTTP services (3-5 seconds). A start_period that is too short causes false unhealthy marks during normal slow startup; one too long delays detection of a genuinely broken container.

COMMON MISTAKE / WARNING
**Common Mistake:** Using `curl` in a health check inside an alpine-based image that does not include curl by default, causing the health check itself to fail with "command not found" rather than reporting the actual application status. Use `wget --spider`, which ships in alpine, or install curl explicitly in the Dockerfile.

COMMON MISTAKE / WARNING
**Security:** Health check endpoints like `/health` are sometimes left unauthenticated and exposed on the same port as the main application, accidentally leaking internal status details (DB connection strings, version numbers) to anyone who can reach the container. Keep health endpoints minimal — return only `200 OK` or `503`, nothing else.

Docker Compose Health Checks and Dependency Ordering

Overview and What You Will Learn

Why This Matters in Production

Core Principles

Detailed Step-by-Step Practical Lab

Production Best Practices and Common Pitfalls

Quick Reference and Troubleshooting Commands

Resources

Explore More in Docker Compose and Multi-Container Applications

Docker Compose Fundamentals — Defining Multi-Container Applications

Docker Volumes and Persistent Storage — Volumes, Bind Mounts, and tmpfs

Docker Compose for Local Development — Full Stack in One Command

Docker Compose in CI/CD Pipelines — GitHub Actions Integration