CrashLoopBackOff - Why Your Pod Keeps Dying
What Does CrashLoopBackOff Mean in Simple Terms?
Your container starts, crashes, Kubernetes restarts it, it crashes again. After a few attempts Kubernetes says: "I will keep trying but I will wait longer each time so I don't waste resources." That waiting period is the backoff. The pod is not stuck — it is actively retrying on a growing timer.
Backoff Timer Progression
+-----------------------------+| Restart 1 -> wait 10s |+-----------------------------+ | v+-----------------------------+| Restart 2 -> wait 20s |+-----------------------------+ | v+-----------------------------+| Restart 3 -> wait 40s |+-----------------------------+ | v+-----------------------------+| Restart 4 -> wait 80s |+-----------------------------+ | v+-----------------------------+| Restart 5+ -> wait 300s | <- permanently capped at 5 minutes+-----------------------------+How to Diagnose CrashLoopBackOff
1# Step 1 — Get logs from the PREVIOUS crashed container2# The current container may have no logs yet — always use --previous3kubectl logs api-server-7d9f8b-xkp2q -n production --previous4 5# Step 2 — Check the exact exit code of the last crash6kubectl get pod api-server-7d9f8b-xkp2q -n production \7 -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'8 9# Step 3 — Read the full event history for the pod10kubectl describe pod api-server-7d9f8b-xkp2q -n production11 12# Step 4 — Watch restart count climb in real time13kubectl get pod api-server-7d9f8b-xkp2q -n production -wExit Code Reference
| Exit Code | Meaning | Typical Fix |
|---|---|---|
1 |
Application error — exception or panic on startup | Check logs for stack trace, missing config |
2 |
Shell or script error in ENTRYPOINT | Check Dockerfile ENTRYPOINT syntax |
137 |
OOMKilled — memory limit exceeded | Increase resources.limits.memory |
139 |
Segmentation fault | Application bug — check core dump |
255 |
Entrypoint binary not found or permission denied | Verify binary path and file permissions in image |
Most Common Causes and Fixes
1. Missing environment variable — app panics on startup
1# deployment.yaml — ensure all required env vars are provided2spec:3 containers:4 - name: api-server5 image: registry.razorpay.in/api-server:v2.4.16 env:7 - name: DATABASE_URL8 valueFrom:9 secretKeyRef:10 name: db-credentials11 key: connection-string # Crashes if this Secret or key doesn't exist12 - name: REDIS_HOST13 value: "10.0.1.50"2. Liveness probe too aggressive — kills a slow-starting pod
1# Increase initialDelaySeconds for apps that take time to boot2livenessProbe:3 httpGet:4 path: /health5 port: 80806 initialDelaySeconds: 30 # Give the app 30s to start before first check7 periodSeconds: 108 failureThreshold: 33. OOMKilled — container hitting memory limit
1# Confirm OOMKill from exit code2kubectl get pod <pod> -n production \3 -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'4# Output: OOMKilled5 6# Fix: increase memory limit in deployment spec7resources:8 requests:9 memory: "256Mi"10 limits:11 memory: "512Mi" # Raise this if app legitimately needs more4. Image entrypoint does not exist
1# Run the image locally to confirm the binary is present2docker run --rm registry.razorpay.in/api-server:v2.4.1 ls /app/3 4# Override entrypoint to debug inside the container5kubectl run debug-shell \6 --image=registry.razorpay.in/api-server:v2.4.1 \7 --command -- sleep 3600 -n production8kubectl exec -it debug-shell -n production -- shQuick Troubleshooting Checklist
| Check | Command |
|---|---|
| Get crash logs | kubectl logs <pod> -n production --previous |
| Read exit code | kubectl get pod <pod> -o jsonpath='..lastState.terminated.exitCode' |
| Check events | kubectl describe pod <pod> -n production |
| Verify Secrets exist | kubectl get secret db-credentials -n production |
| Check resource limits | `kubectl describe pod |
💡 Tip: Always use --previous when pulling logs for a CrashLoopBackOff pod. The current container restarts so fast it often has zero log output. The logs you need are always in the previous container's run.🔴 Common Mistake: Runningkubectl logswithout--previous, seeing no output, and assuming there are no logs. This causes engineers at Swiggy and Razorpay to waste 30 minutes debugging the wrong thing. The crash logs are always there — in the previous container instance.
📌 Remember: CrashLoopBackOff is a symptom, not a root cause. Exit code137points to memory, exit code1points to application errors, exit code255points to a broken image. Always read the exit code first — it tells you exactly which direction to debug.
⚠️ Security: If a pod enters CrashLoopBackOff due to a missing Secret, Kubernetes will log the SecretKeyRef failure in pod events visible to anyone with kubectl describe access. Avoid storing sensitive key names as environment variable names that reveal internal infrastructure topology in production namespaces.