Cluster Autoscaler — Automatic Node Scaling for Kubernetes
What is Cluster Autoscaler in Simple Terms?
HPA scales your pods when traffic increases. But if your nodes do not have enough capacity to schedule those new pods, they stay stuck in Pending. Cluster Autoscaler (CA) is what adds the actual nodes.
CA watches for pods that cannot be scheduled due to insufficient resources and automatically provisions new EC2 (or GCP/Azure) instances to accommodate them. When traffic drops and nodes become idle, CA removes them to reduce your cloud bill.
+------------------------------------------+| Traffic spike at 7pm — Swiggy dinner rush|+------------------------------------------+ | v+------------------------------------------+| HPA scales order-api from 5 to 15 pods |+------------------------------------------+ | v+------------------------------------------+| 10 new pods stuck in Pending || Nodes are full — no room to schedule |+------------------------------------------+ | v+------------------------------------------+| Cluster Autoscaler detects Pending pods || Calculates: need 3 more nodes || Calls AWS API: increase ASG desired=6 |+------------------------------------------+ | v+------------------------------------------+| 3 new EC2 nodes join cluster in ~90s || All 10 pending pods get scheduled || Traffic served successfully |+------------------------------------------+ | Traffic drops at 11pm v+------------------------------------------+| Nodes idle for 10+ minutes || CA drains and terminates 3 extra nodes || Cloud bill reduced |+------------------------------------------+HPA vs Cluster Autoscaler — The Two-Layer Scaling System
+------------------------------------------+| LAYER 1: HPA (Horizontal Pod Autoscaler) || || Scales PODS inside existing nodes || Reacts to CPU, memory, custom metrics || Response time: seconds || Works within current node capacity |+------------------------------------------+ | | When nodes are full v+------------------------------------------+| LAYER 2: Cluster Autoscaler || || Scales NODES (provisions new VMs) || Reacts to Pending pods || Response time: 60-120 seconds || Calls cloud provider API |+------------------------------------------+Setting Up Cluster Autoscaler on EKS
Step 1: Create the IAM Policy for CA
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeScalingActivities", "autoscaling:DescribeLaunchConfigurations", "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup", "ec2:DescribeImages", "ec2:DescribeInstanceTypes", "ec2:DescribeLaunchTemplateVersions", "eks:DescribeNodegroup" ], "Resource": "*" } ]}Step 2: Create IAM Role for Service Account (IRSA)
eksctl create iamserviceaccount \ --cluster=mumbai-prod-cluster \ --namespace=kube-system \ --name=cluster-autoscaler \ --attach-policy-arn=arn:aws:iam::905418385260:policy/ClusterAutoscalerPolicy \ --override-existing-serviceaccounts \ --approve \ --region=ap-south-1Step 3: Tag Your Node Group ASG
# CA uses these tags to discover which ASGs it can manageaws autoscaling create-or-update-tags \ --tags \ ResourceId=eks-production-nodes-asg,\ResourceType=auto-scaling-group,\Key=k8s.io/cluster-autoscaler/enabled,\Value=true,PropagateAtLaunch=true \ ResourceId=eks-production-nodes-asg,\ResourceType=auto-scaling-group,\Key=k8s.io/cluster-autoscaler/mumbai-prod-cluster,\Value=owned,PropagateAtLaunch=trueStep 4: Deploy Cluster Autoscaler
apiVersion: apps/v1kind: Deploymentmetadata: name: cluster-autoscaler namespace: kube-systemspec: replicas: 1 template: spec: serviceAccountName: cluster-autoscaler containers: * image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0 name: cluster-autoscaler command: * ./cluster-autoscaler * --cloud-provider=aws * --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled=true,k8s.io/cluster-autoscaler/mumbai-prod-cluster=owned * --expander=least-waste # Pick node group with most free space * --scale-down-enabled=true * --scale-down-delay-after-add=10m # Wait 10min after scale-up before scaling down * --scale-down-unneeded-time=10m # Node must be idle 10min before removal * --scale-down-utilization-threshold=0.5 # Remove if below 50% utilised * --max-node-provision-time=15m # Fail if node not ready in 15min * --balance-similar-node-groups # Keep node groups balanced across AZs resources: requests: cpu: 100m memory: 600Mi limits: cpu: 100m memory: 600MiTesting That Cluster Autoscaler Works
# Step 1 — Create a deployment that needs more nodes than you havekubectl create deployment ca-test \ --image=nginx \ --replicas=50 \ -n production # Step 2 — Watch pods go Pending (nodes are full)kubectl get pods -n production | grep Pending # Step 3 — Watch CA logs to see it detecting Pending podskubectl logs -f deployment/cluster-autoscaler \ -n kube-system | grep -i "scale up"# scale up: setting group eks-prod-nodes size to 6 # Step 4 — Watch new nodes joining the clusterkubectl get nodes -w# NAME STATUS AGE# mumbai-worker-1 Ready 12d# mumbai-worker-2 Ready 12d# mumbai-worker-3 Ready 12d# mumbai-worker-4 Ready 90s <- CA provisioned this# mumbai-worker-5 Ready 85s <- CA provisioned this# mumbai-worker-6 Ready 80s <- CA provisioned this # Step 5 — Clean up and watch CA scale downkubectl delete deployment ca-test -n production# After 10 minutes (scale-down-unneeded-time), CA removes extra nodeskubectl get nodes -w# mumbai-worker-4 NotReady <- being drained# mumbai-worker-4 removed <- terminatedProtecting Workloads From CA Scale-Down
# Method 1 — Annotation to prevent eviction of a specific podmetadata: annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "false"# Use for: Prometheus, stateful pods, anything slow to restart # Method 2 — PodDisruptionBudget to maintain minimum replicasapiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: payment-api-pdb namespace: productionspec: minAvailable: 3 # CA cannot drain a node if it would reduce below 3 pods selector: matchLabels: app: payment-apiMonitoring Cluster Autoscaler
# View all CA scaling decisions in real timekubectl logs -f deployment/cluster-autoscaler \ -n kube-system | grep -E "scale up|scale down|unneeded" # Check which nodes CA is considering for removalkubectl logs deployment/cluster-autoscaler \ -n kube-system | grep "removing node" # See CA events in the clusterkubectl get events -n kube-system | grep -i autoscal # Check current ASG desired vs actual node countaws autoscaling describe-auto-scaling-groups \ --region ap-south-1 \ --query 'AutoScalingGroups[*].{ Name:AutoScalingGroupName, Min:MinSize, Max:MaxSize, Desired:DesiredCapacity }'COMMON MISTAKE / WARNING**Security:** Always set a `maxSize` on your node groups. Without a ceiling, a traffic surge, a misconfigured HPA with `maxReplicas` set too high, or a runaway batch job can trigger CA to provision hundreds of nodes — generating an enormous unexpected cloud bill. At Zerodha, CA max node counts are reviewed and approved as part of every infrastructure change review.
REMEMBER THIS**Remember:** CA scale-up takes 60-120 seconds — it must provision a real EC2 instance, wait for it to pass health checks, join the cluster, and download the container image. If your traffic can spike faster than this (Hotstar during IPL match start), use Kubernetes Overprovisioning — run low-priority placeholder pods that fill your nodes. When real pods need scheduling, the placeholders are evicted instantly freeing space — no waiting for new nodes.
COMMON MISTAKE / WARNING**Common Mistake:** Setting `scale-down-delay-after-add` too short or removing it entirely. Without this delay, CA may provision a new node at 7:05pm, realise traffic dropped slightly at 7:08pm, and immediately remove the node at 7:08pm — only to provision it again at 7:10pm when the next traffic wave arrives. This node churn wastes money and causes unnecessary pod disruption. Set `scale-down-delay-after-add` to at least 10 minutes.
PLACEMENT PRO TIP**Tip:** Use multiple instance types in your node group rather than a single type. Configure `--balance-similar-node-groups` in CA. If AWS runs out of `t3.large` in `ap-south-1a` during a traffic spike, CA can fall back to `t3.xlarge` or `m5.large` and provision in `ap-south-1b`. Single instance type node groups get stuck when AWS has limited capacity for that specific type in your availability zone — which happens regularly with popular instance types during regional demand spikes.