What is Rolling Update? | DevOps Dictionary

Rolling Update — The Default Zero-Downtime Release Strategy

What is a Rolling Update in Simple Terms?

A rolling update replaces your old version with a new version one pod at a time, while always keeping enough pods running to serve traffic. Users never see downtime because there is always at least one healthy pod available throughout the entire update process.

Before Kubernetes, deploying a new version meant stopping the old version and starting the new one — causing downtime. Rolling update eliminates this by overlapping the old and new versions. While a new pod starts up and passes health checks, the old pods keep serving traffic. Only once the new pod is confirmed healthy does an old pod get terminated.

◈ DIAGRAM

Before update — 3 pods on v1:
+----------+  +----------+  +----------+
| Pod 1    |  | Pod 2    |  | Pod 3    |
| v1.0     |  | v1.0     |  | v1.0     |
| Serving  |  | Serving  |  | Serving  |
+----------+  +----------+  +----------+
 
Step 1 — maxSurge=1 creates a new pod:
+----------+  +----------+  +----------+  +----------+
| Pod 1    |  | Pod 2    |  | Pod 3    |  | Pod 4    |
| v1.0     |  | v1.0     |  | v1.0     |  | v2.0     |
| Serving  |  | Serving  |  | Serving  |  | Starting |
+----------+  +----------+  +----------+  +----------+
 
Step 2 — Pod 4 passes readinessProbe, Pod 1 terminated:
+----------+  +----------+  +----------+
| Pod 2    |  | Pod 3    |  | Pod 4    |
| v1.0     |  | v1.0     |  | v2.0     |
| Serving  |  | Serving  |  | Serving  |
+----------+  +----------+  +----------+
 
Step 3 — New pod created for Pod 2, Pod 2 terminated:
+----------+  +----------+  +----------+
| Pod 3    |  | Pod 4    |  | Pod 5    |
| v1.0     |  | v2.0     |  | v2.0     |
| Serving  |  | Serving  |  | Serving  |
+----------+  +----------+  +----------+
 
Step 4 — New pod created for Pod 3, Pod 3 terminated:
+----------+  +----------+  +----------+
| Pod 4    |  | Pod 5    |  | Pod 6    |
| v2.0     |  | v2.0     |  | v2.0     |
| Serving  |  | Serving  |  | Serving  |
+----------+  +----------+  +----------+
 
Update complete — 3 pods on v2, zero downtime throughout.

Configuring Rolling Update Strategy

The rolling update configuration lives inside the Deployment spec:

YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
  namespace: production
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2        # Create up to 2 extra pods above desired count
                         # During update: up to 8 pods exist (6 desired + 2 surge)
                         # Speeds up the update (2 pods replaced at once)
                         # Can be an absolute number or a percentage: "33%"
 
      maxUnavailable: 0  # Never reduce below desired count
                         # Guarantees 6 pods always serving traffic
                         # NEVER set this above 0 for production services
                         # Can be an absolute number or a percentage
  template:
    spec:
      containers:
        * name: payment-api
          image: registry.razorpay.in/payment-api:v3.1.0
          readinessProbe:       # CRITICAL: controls when new pods receive traffic
            httpGet:            # Rolling update WAITS for this to pass
              path: /health     # before proceeding to the next pod
              port: 8080
            initialDelaySeconds: 15   # Wait 15s before first health check
            periodSeconds: 5          # Check every 5 seconds
            successThreshold: 1       # One success = ready
            failureThreshold: 3       # Three failures = remove from Service
          livenessProbe:        # Controls when Kubernetes restarts a container
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 15
            failureThreshold: 3

Understanding maxSurge and maxUnavailable

◈ DIAGRAM

| replicas: 10 | maxSurge: 2 | maxUnavailable: 0
|
| During update:
| Minimum pods: 10 - 0 = 10 (maxUnavailable=0 means never go below 10)
| Maximum pods: 10 + 2 = 12 (maxSurge=2 allows 2 extra)
|
| Update pace: replaces 2 pods at a time (maxSurge=2)
| Total time: 5 batches of 2 = faster rollout
| Safety: never below 10 pods serving traffic
 
| replicas: 10 | maxSurge: 1 | maxUnavailable: 1
|
| During update:
| Minimum pods: 10 - 1 = 9 (can drop to 9 pods serving traffic)
| Maximum pods: 10 + 1 = 11
|
| This means 10% less capacity during the update
| Acceptable only if you have enough headroom in your remaining pods

Triggering and Monitoring Rolling Updates

Bash

# Trigger a rolling update by changing the image tag
kubectl set image deployment/payment-api \
  payment-api=registry.razorpay.in/payment-api:v3.2.0 \
  -n production
 
# OR update the deployment YAML and apply it
# kubectl apply -f deployment.yaml
 
# Watch the rollout in real time
kubectl rollout status deployment/payment-api -n production
# Waiting for deployment "payment-api" rollout to finish:
# 2 out of 6 new replicas have been updated...
# 4 out of 6 new replicas have been updated...
# 6 out of 6 new replicas have been updated...
# Waiting for 3 old replicas to be terminated...
# deployment "payment-api" successfully rolled out
 
# Watch individual pods being replaced
kubectl get pods -n production -l app=payment-api --watch
# NAME                        READY   STATUS
# payment-api-7d9f8c-xkp2q   1/1     Running    <- old pod
# payment-api-7d9f8c-ab1cd   1/1     Running    <- old pod
# payment-api-6b8d4a-mnp7r   0/1     Pending    <- new pod starting
# payment-api-6b8d4a-mnp7r   1/1     Running    <- new pod ready
# payment-api-7d9f8c-xkp2q   1/1     Terminating <- old pod dying
 
# Check the revision history of all rollouts
kubectl rollout history deployment/payment-api -n production
# REVISION  CHANGE-CAUSE
# 1         Initial deployment v3.0.0
# 2         Updated to v3.1.0
# 3         Updated to v3.2.0
 
# Instantly rollback to the previous version
kubectl rollout undo deployment/payment-api -n production
 
# Rollback to a specific revision
kubectl rollout undo deployment/payment-api \
  --to-revision=2 \
  -n production

Pausing a Rolling Update Mid-Way for Canary Testing

You can pause a rolling update after a few pods have been replaced to test the new version before completing the rollout:

Bash

# Start the update
kubectl set image deployment/payment-api \
  payment-api=registry.razorpay.in/payment-api:v3.2.0 \
  -n production
 
# Immediately pause it after 1-2 pods are replaced
kubectl rollout pause deployment/payment-api -n production
 
# Now 1-2 pods are on v3.2.0, rest are still on v3.1.0
# Monitor the new pods: check logs, check Grafana error rates
kubectl get pods -n production -l app=payment-api
# Some pods show the new image, some show the old
 
# If the new pods look good, resume the rollout
kubectl rollout resume deployment/payment-api -n production
 
# If the new pods have problems, rollback before more pods are updated
kubectl rollout undo deployment/payment-api -n production

Force Restart All Pods Without Changing the Image

Bash

# Useful when a ConfigMap or Secret was updated
# Restarts all pods with rolling update strategy (zero downtime)
kubectl rollout restart deployment/payment-api -n production
 
# Watch the restart happen
kubectl rollout status deployment/payment-api -n production

Troubleshooting a Stuck Rolling Update

Bash

# Rolling update stuck — some pods are in Pending or new pods not becoming Ready
 
# Step 1 — Check why new pods are not becoming Ready
kubectl get pods -n production -l app=payment-api
# Look for pods showing 0/1 Ready for more than 2 minutes
 
# Step 2 — Describe the stuck pod for events
kubectl describe pod payment-api-6b8d4a-mnp7r -n production
# Events section will show:
# Readiness probe failed: HTTP probe failed with statuscode: 500
# <- The readinessProbe is failing — new version has a bug
 
# Step 3 — Check the new pod logs
kubectl logs payment-api-6b8d4a-mnp7r -n production
# Look for startup errors or missing configuration
 
# Step 4 — Rollback immediately to stop the broken version spreading
kubectl rollout undo deployment/payment-api -n production
 
# Step 5 — Verify rollback completed
kubectl rollout status deployment/payment-api -n production

REMEMBER THIS
**Remember:** The `readinessProbe` is the most important safety mechanism in a rolling update. Kubernetes does NOT replace the next pod until the new pod passes the readiness probe. Without a readiness probe, Kubernetes assumes the container is ready the moment it starts — which may be 30 seconds before your application has finished connecting to the database and warming up its cache. Always define a readiness probe that truly reflects whether the application can handle requests.

PLACEMENT PRO TIP
**Tip:** Annotate your deployments with the change reason using `kubernetes.io/change-cause` before each rollout. This makes `kubectl rollout history` show meaningful messages like "v3.2.0 — added UPI retry logic" instead of blank lines. During an incident when you need to rollback to a specific revision, knowing which revision number corresponds to which release is critical.

COMMON MISTAKE / WARNING
**Common Mistake:** Setting `maxUnavailable: 25%` (or any non-zero value) for production services that cannot tolerate reduced capacity. During a rolling update with 10 replicas and `maxUnavailable: 25%`, Kubernetes immediately terminates 2-3 old pods before any new pods are ready — briefly serving 70-75% of your normal capacity. During a traffic spike this causes request failures. Always set `maxUnavailable: 0` for any production service with strict availability requirements.

COMMON MISTAKE / WARNING
**Security:** Rolling updates work best when your new application version is backward compatible with the running version — both reading from the same database schema, accepting the same request formats, and publishing to the same message queue topics. During the update, both v1 and v2 pods run simultaneously and handle requests. If v2 changes a database schema that v1 cannot read, the rolling update causes data corruption. Always use backward-compatible changes or coordinate database migrations separately from application deployments.