What is CrashLoopBackOff? | DevOps Dictionary

CrashLoopBackOff - Why Your Pod Keeps Dying

What Does CrashLoopBackOff Mean in Simple Terms?

Your container starts, crashes, Kubernetes restarts it, it crashes again. After a few attempts Kubernetes says: "I will keep trying but I will wait longer each time so I don't waste resources." That waiting period is the backoff. The pod is not stuck — it is actively retrying on a growing timer.

Backoff Timer Progression

◈ DIAGRAM

+-----------------------------+
| Restart 1  ->  wait 10s     |
+-----------------------------+
            |
            v
+-----------------------------+
| Restart 2  ->  wait 20s     |
+-----------------------------+
            |
            v
+-----------------------------+
| Restart 3  ->  wait 40s     |
+-----------------------------+
            |
            v
+-----------------------------+
| Restart 4  ->  wait 80s     |
+-----------------------------+
            |
            v
+-----------------------------+
| Restart 5+ ->  wait 300s    | <- permanently capped at 5 minutes
+-----------------------------+

How to Diagnose CrashLoopBackOff

Bash

# Step 1 — Get logs from the PREVIOUS crashed container
# The current container may have no logs yet — always use --previous
kubectl logs api-server-7d9f8b-xkp2q -n production --previous
 
# Step 2 — Check the exact exit code of the last crash
kubectl get pod api-server-7d9f8b-xkp2q -n production \
  -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
 
# Step 3 — Read the full event history for the pod
kubectl describe pod api-server-7d9f8b-xkp2q -n production
 
# Step 4 — Watch restart count climb in real time
kubectl get pod api-server-7d9f8b-xkp2q -n production -w

Exit Code Reference

Exit Code	Meaning	Typical Fix
`1`	Application error — exception or panic on startup	Check logs for stack trace, missing config
`2`	Shell or script error in ENTRYPOINT	Check Dockerfile ENTRYPOINT syntax
`137`	OOMKilled — memory limit exceeded	Increase `resources.limits.memory`
`139`	Segmentation fault	Application bug — check core dump
`255`	Entrypoint binary not found or permission denied	Verify binary path and file permissions in image

Most Common Causes and Fixes

1. Missing environment variable — app panics on startup

YAML

# deployment.yaml — ensure all required env vars are provided
spec:
  containers:
    - name: api-server
      image: registry.razorpay.in/api-server:v2.4.1
      env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: connection-string   # Crashes if this Secret or key doesn't exist
        - name: REDIS_HOST
          value: "10.0.1.50"

2. Liveness probe too aggressive — kills a slow-starting pod

YAML

# Increase initialDelaySeconds for apps that take time to boot
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30    # Give the app 30s to start before first check
  periodSeconds: 10
  failureThreshold: 3

3. OOMKilled — container hitting memory limit

Bash

# Confirm OOMKill from exit code
kubectl get pod <pod> -n production \
  -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'
# Output: OOMKilled
 
# Fix: increase memory limit in deployment spec
resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"     # Raise this if app legitimately needs more

4. Image entrypoint does not exist

Bash

# Run the image locally to confirm the binary is present
docker run --rm registry.razorpay.in/api-server:v2.4.1 ls /app/
 
# Override entrypoint to debug inside the container
kubectl run debug-shell \
  --image=registry.razorpay.in/api-server:v2.4.1 \
  --command -- sleep 3600 -n production
kubectl exec -it debug-shell -n production -- sh

Quick Troubleshooting Checklist

Check	Command
Get crash logs	`kubectl logs <pod> -n production --previous`
Read exit code	`kubectl get pod <pod> -o jsonpath='..lastState.terminated.exitCode'`
Check events	`kubectl describe pod <pod> -n production`
Verify Secrets exist	`kubectl get secret db-credentials -n production`
Check resource limits	`kubectl describe pod

PLACEMENT PRO TIP
**Tip:** Always use `--previous` when pulling logs for a CrashLoopBackOff pod. The current container restarts so fast it often has zero log output. The logs you need are always in the previous container's run.

COMMON MISTAKE / WARNING
**Common Mistake:** Running `kubectl logs` without `--previous`, seeing no output, and assuming there are no logs. This causes engineers at Swiggy and Razorpay to waste 30 minutes debugging the wrong thing. The crash logs are always there — in the previous container instance.

REMEMBER THIS
**Remember:** CrashLoopBackOff is a symptom, not a root cause. Exit code `137` points to memory, exit code `1` points to application errors, exit code `255` points to a broken image. Always read the exit code first — it tells you exactly which direction to debug.

COMMON MISTAKE / WARNING
**Security:** If a pod enters CrashLoopBackOff due to a missing Secret, Kubernetes will log the SecretKeyRef failure in pod events visible to anyone with `kubectl describe` access. Avoid storing sensitive key names as environment variable names that reveal internal infrastructure topology in production namespaces.