What is Cluster Autoscaler? | DevOps Dictionary

Cluster Autoscaler — Automatic Node Scaling for Kubernetes

What is Cluster Autoscaler in Simple Terms?

HPA scales your pods when traffic increases. But if your nodes do not have enough capacity to schedule those new pods, they stay stuck in Pending. Cluster Autoscaler (CA) is what adds the actual nodes.

CA watches for pods that cannot be scheduled due to insufficient resources and automatically provisions new EC2 (or GCP/Azure) instances to accommodate them. When traffic drops and nodes become idle, CA removes them to reduce your cloud bill.

◈ DIAGRAM

+------------------------------------------+
| Traffic spike at 7pm — Swiggy dinner rush|
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| HPA scales order-api from 5 to 15 pods  |
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| 10 new pods stuck in Pending             |
| Nodes are full — no room to schedule     |
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| Cluster Autoscaler detects Pending pods  |
| Calculates: need 3 more nodes            |
| Calls AWS API: increase ASG desired=6    |
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| 3 new EC2 nodes join cluster in ~90s     |
| All 10 pending pods get scheduled        |
| Traffic served successfully              |
+------------------------------------------+
                    |
          Traffic drops at 11pm
                    v
+------------------------------------------+
| Nodes idle for 10+ minutes               |
| CA drains and terminates 3 extra nodes   |
| Cloud bill reduced                       |
+------------------------------------------+

HPA vs Cluster Autoscaler — The Two-Layer Scaling System

◈ DIAGRAM

+------------------------------------------+
| LAYER 1: HPA (Horizontal Pod Autoscaler) |
|                                          |
| Scales PODS inside existing nodes        |
| Reacts to CPU, memory, custom metrics   |
| Response time: seconds                   |
| Works within current node capacity       |
+------------------------------------------+
                    |
                    | When nodes are full
                    v
+------------------------------------------+
| LAYER 2: Cluster Autoscaler              |
|                                          |
| Scales NODES (provisions new VMs)        |
| Reacts to Pending pods                  |
| Response time: 60-120 seconds            |
| Calls cloud provider API                 |
+------------------------------------------+

Setting Up Cluster Autoscaler on EKS

Step 1: Create the IAM Policy for CA

JSON

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup",
        "ec2:DescribeImages",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplateVersions",
        "eks:DescribeNodegroup"
      ],
      "Resource": "*"
    }
  ]
}

Step 2: Create IAM Role for Service Account (IRSA)

Bash

eksctl create iamserviceaccount \
  --cluster=mumbai-prod-cluster \
  --namespace=kube-system \
  --name=cluster-autoscaler \
  --attach-policy-arn=arn:aws:iam::905418385260:policy/ClusterAutoscalerPolicy \
  --override-existing-serviceaccounts \
  --approve \
  --region=ap-south-1

Step 3: Tag Your Node Group ASG

Bash

# CA uses these tags to discover which ASGs it can manage
aws autoscaling create-or-update-tags \
  --tags \
    ResourceId=eks-production-nodes-asg,\
ResourceType=auto-scaling-group,\
Key=k8s.io/cluster-autoscaler/enabled,\
Value=true,PropagateAtLaunch=true \
    ResourceId=eks-production-nodes-asg,\
ResourceType=auto-scaling-group,\
Key=k8s.io/cluster-autoscaler/mumbai-prod-cluster,\
Value=owned,PropagateAtLaunch=true

Step 4: Deploy Cluster Autoscaler

YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  template:
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
        * image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
          name: cluster-autoscaler
          command:
            * ./cluster-autoscaler
            * --cloud-provider=aws
            * --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled=true,k8s.io/cluster-autoscaler/mumbai-prod-cluster=owned
            * --expander=least-waste         # Pick node group with most free space
            * --scale-down-enabled=true
            * --scale-down-delay-after-add=10m   # Wait 10min after scale-up before scaling down
            * --scale-down-unneeded-time=10m     # Node must be idle 10min before removal
            * --scale-down-utilization-threshold=0.5  # Remove if below 50% utilised
            * --max-node-provision-time=15m       # Fail if node not ready in 15min
            * --balance-similar-node-groups       # Keep node groups balanced across AZs
          resources:
            requests:
              cpu: 100m
              memory: 600Mi
            limits:
              cpu: 100m
              memory: 600Mi

Testing That Cluster Autoscaler Works

Bash

# Step 1 — Create a deployment that needs more nodes than you have
kubectl create deployment ca-test \
  --image=nginx \
  --replicas=50 \
  -n production
 
# Step 2 — Watch pods go Pending (nodes are full)
kubectl get pods -n production | grep Pending
 
# Step 3 — Watch CA logs to see it detecting Pending pods
kubectl logs -f deployment/cluster-autoscaler \
  -n kube-system | grep -i "scale up"
# scale up: setting group eks-prod-nodes size to 6
 
# Step 4 — Watch new nodes joining the cluster
kubectl get nodes -w
# NAME                STATUS     AGE
# mumbai-worker-1     Ready      12d
# mumbai-worker-2     Ready      12d
# mumbai-worker-3     Ready      12d
# mumbai-worker-4     Ready      90s  <- CA provisioned this
# mumbai-worker-5     Ready      85s  <- CA provisioned this
# mumbai-worker-6     Ready      80s  <- CA provisioned this
 
# Step 5 — Clean up and watch CA scale down
kubectl delete deployment ca-test -n production
# After 10 minutes (scale-down-unneeded-time), CA removes extra nodes
kubectl get nodes -w
# mumbai-worker-4   NotReady  <- being drained
# mumbai-worker-4   removed   <- terminated

Protecting Workloads From CA Scale-Down

YAML

# Method 1 — Annotation to prevent eviction of a specific pod
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
# Use for: Prometheus, stateful pods, anything slow to restart
 
# Method 2 — PodDisruptionBudget to maintain minimum replicas
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payment-api-pdb
  namespace: production
spec:
  minAvailable: 3   # CA cannot drain a node if it would reduce below 3 pods
  selector:
    matchLabels:
      app: payment-api

Monitoring Cluster Autoscaler

Bash

# View all CA scaling decisions in real time
kubectl logs -f deployment/cluster-autoscaler \
  -n kube-system | grep -E "scale up|scale down|unneeded"
 
# Check which nodes CA is considering for removal
kubectl logs deployment/cluster-autoscaler \
  -n kube-system | grep "removing node"
 
# See CA events in the cluster
kubectl get events -n kube-system | grep -i autoscal
 
# Check current ASG desired vs actual node count
aws autoscaling describe-auto-scaling-groups \
  --region ap-south-1 \
  --query 'AutoScalingGroups[*].{
    Name:AutoScalingGroupName,
    Min:MinSize,
    Max:MaxSize,
    Desired:DesiredCapacity
  }'

COMMON MISTAKE / WARNING
**Security:** Always set a `maxSize` on your node groups. Without a ceiling, a traffic surge, a misconfigured HPA with `maxReplicas` set too high, or a runaway batch job can trigger CA to provision hundreds of nodes — generating an enormous unexpected cloud bill. At Zerodha, CA max node counts are reviewed and approved as part of every infrastructure change review.

REMEMBER THIS
**Remember:** CA scale-up takes 60-120 seconds — it must provision a real EC2 instance, wait for it to pass health checks, join the cluster, and download the container image. If your traffic can spike faster than this (Hotstar during IPL match start), use Kubernetes Overprovisioning — run low-priority placeholder pods that fill your nodes. When real pods need scheduling, the placeholders are evicted instantly freeing space — no waiting for new nodes.

COMMON MISTAKE / WARNING
**Common Mistake:** Setting `scale-down-delay-after-add` too short or removing it entirely. Without this delay, CA may provision a new node at 7:05pm, realise traffic dropped slightly at 7:08pm, and immediately remove the node at 7:08pm — only to provision it again at 7:10pm when the next traffic wave arrives. This node churn wastes money and causes unnecessary pod disruption. Set `scale-down-delay-after-add` to at least 10 minutes.

PLACEMENT PRO TIP
**Tip:** Use multiple instance types in your node group rather than a single type. Configure `--balance-similar-node-groups` in CA. If AWS runs out of `t3.large` in `ap-south-1a` during a traffic spike, CA can fall back to `t3.xlarge` or `m5.large` and provision in `ap-south-1b`. Single instance type node groups get stuck when AWS has limited capacity for that specific type in your availability zone — which happens regularly with popular instance types during regional demand spikes.