What is the career path for learning Debugging Docker Containers — Logs, Exec, and Inspect?

Mastering Debugging Docker Containers — Logs, Exec, and Inspect enables engineering opportunities in DevOps, SRE, and cloud platform automation.

How long does it take to learn Debugging Docker Containers — Logs, Exec, and Inspect?

Most students gain core proficiency in Debugging Docker Containers — Logs, Exec, and Inspect in 2–3 weeks of active hands-on labs.

Debugging Docker Containers — Logs, Exec, and Inspect | DevOps Network

Overview and What You Will Learn

A container that is not working is not the same as a container that is broken. The difference is what you do next. Engineers who know Docker's debugging tools can identify the root cause of any container failure in under five minutes. Engineers who do not know these tools restart the container and hope for the best.

In this guide you will learn a systematic debugging workflow — starting with logs, moving to inspect when logs are not enough, using exec to investigate a running container from the inside, and using exit codes to understand exactly why a container died. You will also learn how to debug a container that has already stopped, which is often the hardest case.

Why This Matters in Production

At Hotstar during an IPL match, a container serving video streams starts failing. Thousands of users are seeing errors. You have minutes to diagnose and fix it. The engineer who can read docker logs with the right filters, spot a memory limit being hit in docker stats, and verify a config file inside the container with docker exec resolves it quickly. The engineer who only knows to restart the container makes it worse by losing the state needed for diagnosis.

Core Principles

The debugging process follows a consistent order. Each step gives you more information than the last.

Bash

+------------------------------------------+
| Step 1: docker ps -a                     |
| Is the container running or stopped?     |
| What is the exit code?                   |
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| Step 2: docker logs                      |
| What did the application print?          |
| Is there an obvious error message?       |
+------------------------------------------+
                    |
          If logs not enough
                    v
+------------------------------------------+
| Step 3: docker inspect                   |
| What config was the container started    |
| with? What env vars? What mounts?        |
+------------------------------------------+
                    |
          If container is running
                    v
+------------------------------------------+
| Step 4: docker exec -it container bash   |
| Investigate from inside the container   |
| Check files, network, processes          |
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| Step 5: docker stats                     |
| Is the container hitting resource limits?|
| Memory near limit? CPU throttled?        |
+------------------------------------------+

Detailed Step-by-Step Practical Lab

Milestone 1: Reading Exit Codes

Every stopped container has an exit code. The exit code tells you exactly what happened.

Bash

# Check exit codes for stopped containers
docker ps -a --format "table {{.Names}}\t{{.Status}}"
# NAMES           STATUS
# payment-api     Exited (1) 5 minutes ago
# db-migrator     Exited (0) 10 minutes ago
# order-worker    Exited (137) 2 minutes ago
# cache-warmer    Exited (139) 1 minute ago
 
# Exit code meanings:
# 0   = Clean exit — process finished successfully (batch jobs, migrations)
# 1   = Generic application error — check logs for details
# 2   = Misuse of shell builtin or invalid argument
# 126 = Command found but not executable (permission denied)
# 127 = Command not found (wrong PATH or typo in CMD)
# 128 = Invalid signal number
# 137 = Killed by signal 9 (SIGKILL) — OOMKilled or docker kill
# 139 = Segmentation fault (signal 11)
# 143 = Killed by signal 15 (SIGTERM) — docker stop or clean shutdown
 
# Confirm OOMKill specifically
docker inspect --format '{{.State.OOMKilled}}' order-worker
# true — yes this container was killed because it exceeded memory limit
 
# Get the full state object
docker inspect --format '{{json .State}}' order-worker
# {"Status":"exited","Running":false,"Paused":false,"Restarting":false,
#  "OOMKilled":true,"Dead":false,"Pid":0,"ExitCode":137,...}

Milestone 2: Advanced Log Reading

Bash

# Basic log reading
docker logs payment-api
 
# Real-world scenario: container crashes with restart policy
# It restarts 3 times before you notice
# docker logs shows logs from ALL runs concatenated together
# Use --since to isolate the most recent failure
 
docker logs --since 5m payment-api
# Shows only logs from the last 5 minutes — only the most recent crash
 
# Get the last 50 lines plus follow for new output
docker logs --tail 50 -f payment-api
 
# Search for errors in logs (pipe to grep)
docker logs payment-api 2>&1 | grep -i "error\|fatal\|exception"
# 2>&1 redirects stderr to stdout so grep sees both
 
# Save logs to a file for sharing with your team
docker logs payment-api > /tmp/payment-api-logs.txt 2>&1
 
# Count occurrences of an error
docker logs payment-api 2>&1 | grep -c "connection refused"
# 47 — database connection is being refused 47 times
 
# See the exact moment the container started having issues
docker logs --timestamps payment-api | grep "ERROR"
# 2024-01-15T14:23:01.123456789Z ERROR: DB connection failed
# 2024-01-15T14:23:02.234567890Z ERROR: DB connection failed
# Pattern: errors started at 14:23 — check what changed at that time
 
# For a container that has stopped and been restarted:
# --previous flag shows logs from the PREVIOUS run (before last restart)
docker logs --previous payment-api
# Shows why it crashed before the current run
# Critical for diagnosing containers that restart immediately

Milestone 3: Deep Inspection with docker inspect

Bash

# Full inspection — everything Docker knows about the container
docker inspect payment-api
 
# This returns a 200+ line JSON object. Learn to extract exactly what you need:
 
# Check what environment variables the container was started with
docker inspect --format '{{range .Config.Env}}{{println .}}{{end}}' payment-api
# NODE_ENV=production
# DB_HOST=10.0.1.50
# DB_PORT=5432
# DB_PASSWORD=***hidden***
 
# Check what image the container is running
docker inspect --format '{{.Config.Image}}' payment-api
# registry.razorpay.in/payment-api:v3.1.0
 
# Check what ports are published
docker inspect --format '{{json .NetworkSettings.Ports}}' payment-api
# {"8080/tcp":[{"HostIp":"0.0.0.0","HostPort":"8080"}]}
 
# Check what volumes are mounted
docker inspect --format '{{range .Mounts}}{{.Type}} {{.Source}} -> {{.Destination}}{{println}}{{end}}' payment-api
# volume /var/lib/docker/volumes/payment-data/_data -> /app/data
 
# Check the restart policy
docker inspect --format '{{.HostConfig.RestartPolicy.Name}}' payment-api
# unless-stopped
 
# Check memory and CPU limits
docker inspect --format 'Memory: {{.HostConfig.Memory}} CPU: {{.HostConfig.NanoCpus}}' payment-api
# Memory: 536870912 CPU: 1000000000
# 536870912 bytes = 512MB, 1000000000 nanocpus = 1 CPU
 
# Check the container network and IP
docker inspect --format '{{range $net, $config := .NetworkSettings.Networks}}{{$net}}: {{$config.IPAddress}}{{println}}{{end}}' payment-api
# bridge: 172.17.0.3
# payment-network: 10.0.1.25
 
# Check when the container was created and last started
docker inspect --format 'Created: {{.Created}} Started: {{.State.StartedAt}}' payment-api

Milestone 4: Investigating from Inside with docker exec

Bash

# Get an interactive shell inside a running container
docker exec -it payment-api bash
# If bash is not available (minimal images):
docker exec -it payment-api sh
 
# Once inside the container, you can investigate:
# Check what processes are running
ps aux
 
# Check network connectivity
curl http://postgres:5432  # Try to reach the database by service name
# Can the container reach the database?
 
# Check DNS resolution
nslookup postgres
# Does the container resolve service names correctly?
 
# Check which ports are listening
ss -tulpn
# or
netstat -tulpn
 
# Read a config file
cat /app/config/database.yml
 
# Check disk space inside the container
df -h
 
# Check environment variables as seen by the process
env | sort
 
# Check file permissions that might be causing issues
ls -la /app/data/
 
# Exit the container shell
exit

When the container is running but behaving incorrectly, exec is your most powerful tool. You are seeing the exact environment the application sees.

Milestone 5: Debugging a Stopped Container

Exec only works on running containers. When a container crashes and stays stopped, you need a different approach.

Bash

# Method 1: Read the logs from the stopped container
# Logs are preserved until docker rm is run
docker logs stopped-payment-api --tail 100
 
# Method 2: Create a new container from the same image with a shell override
# This starts the same image but runs bash instead of the normal command
# Lets you inspect the filesystem and config without the app crash happening
docker run -it --rm \
  --entrypoint bash \
  registry.razorpay.in/payment-api:v3.1.0
# Now you are inside the image environment
# Check config files, verify binaries exist, check permissions
 
# Method 3: Commit the stopped container to a new image and inspect it
docker commit stopped-payment-api debug-payment-api
docker run -it --rm debug-payment-api bash
# This preserves any files that were written during the container's run
# Useful when the crash modified files (created a lockfile, corrupted a db, etc.)
 
# Method 4: Use docker diff to see what files changed during the container run
docker diff stopped-payment-api
# A /app/logs/error.log   <- A = Added
# C /app/config/db.yml    <- C = Changed
# D /app/tmp/lock.pid     <- D = Deleted
# Shows every file the container modified from its image

Milestone 6: Diagnosing Resource Problems with docker stats

Bash

# Live monitoring of all running containers
docker stats
 
# What each column means:
# CONTAINER ID  — short container ID
# NAME          — container name
# CPU %         — CPU usage relative to the CPU quota
# MEM USAGE     — current memory used vs the container's memory limit
# MEM %         — memory usage as a percentage of the limit
# NET I/O       — network bytes received / transmitted
# BLOCK I/O     — disk bytes read / written
# PIDS          — number of processes inside the container
 
# Warning signs:
# CPU % consistently above 80% — app is CPU-bound or stuck in a loop
# MEM % above 85% — approaching the limit, OOMKill risk
# PIDS growing over time — process leak, not cleaning up child processes
 
# One-time snapshot for all containers
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
# NAME           CPU %    MEM USAGE / LIMIT    MEM %
# payment-api    45.2%    420MiB / 512MiB      82.0%   <- getting close to limit!
# order-api      2.1%     128MiB / 512MiB      25.0%
# postgres       0.8%     256MiB / 1GiB        25.0%
 
# If memory is above 85%:
# Either increase the container memory limit:
docker update --memory 1g payment-api
# Or investigate the memory leak in the application

Common Mistakes

Mistake	What Goes Wrong	Fix
Reading logs without `--since` on a restarting container	Sees thousands of lines from many restarts	Always use `--since 5m` or `--tail 100` to scope it
Not checking OOMKilled before investigating logs	Misses that container was killed by the kernel	Check `docker inspect --format '{{.State.OOMKilled}}'` first
Running `docker exec` on the wrong container	Investigating the healthy replica instead of the crashed one	Always use `docker ps -a` to find the exact container ID first
Overwriting the crashed container with `docker rm` before saving logs	Evidence gone permanently	Save logs to a file before cleaning up: `docker logs name > /tmp/logs.txt`
Using `docker exec` to make permanent fixes	Changes are lost when container restarts	Fix the Dockerfile or environment config, rebuild the image

Troubleshooting Reference

Exit Code	Meaning	First Step
0	Clean exit	Check if this is a batch job that should exit, not a server
1	Application error	`docker logs --tail 50 name`
127	Command not found	Check CMD/ENTRYPOINT in Dockerfile — binary might not exist in image
137	OOMKilled or docker kill	`docker inspect --format '{{.State.OOMKilled}}'` — if true, increase memory limit
139	Segfault	Application crash — check logs and report to app developer
143	SIGTERM received	Normal stop via `docker stop` — not an error

PLACEMENT PRO TIP
**Tip:** When a container is crashing in a restart loop, use `docker logs --previous name` to see the logs from the run before the current one. The current run's logs may only show the startup sequence — the error that caused the crash is in the previous run's logs.

REMEMBER THIS
**Remember:** `docker inspect` is the single most comprehensive source of information about a container. Before Googling a problem, try `docker inspect container-name` and read the State, HostConfig, and NetworkSettings sections. The answer to most configuration problems is in there.

COMMON MISTAKE / WARNING
**Common Mistake:** Running `docker exec` to fix a problem inside a running container. Any change you make inside a container via exec is temporary — it disappears the next time the container restarts. The correct fix is always to change the Dockerfile, rebuild the image, and redeploy. Exec is for investigation, never for making production fixes.

COMMON MISTAKE / WARNING
**Security:** On production hosts, all `docker exec` sessions should be logged for audit purposes. An engineer who can exec into a production container can read secrets from environment variables, read files, and exfiltrate data. Consider restricting exec access to debugging sessions only and requiring a written justification in your incident management system before granting access to production containers.

Debugging Docker Containers — Logs, Exec, and Inspect

Overview and What You Will Learn

Why This Matters in Production

Core Principles

Detailed Step-by-Step Practical Lab

Milestone 1: Reading Exit Codes

Milestone 2: Advanced Log Reading

Milestone 3: Deep Inspection with docker inspect

Milestone 4: Investigating from Inside with docker exec

Milestone 5: Debugging a Stopped Container

Milestone 6: Diagnosing Resource Problems with docker stats

Common Mistakes

Troubleshooting Reference

Resources

Explore More in Docker Networking and Storage

Docker Networking Deep Dive — Bridge, Host, Overlay, and None

Docker DNS and Service Discovery — How Containers Find Each Other