What Is Infrastructure Drift?
Drift is the gap between what Terraform thinks exists and what actually exists in the cloud.
Terraform's state file is a snapshot taken at the last apply. The moment someone logs into the AWS console and changes a security group rule, adjusts an instance type, or deletes a bucket — without going through Terraform — the real world and the state file diverge. That gap is drift.
Drift happens at the best companies, on the best teams. An incident fires at 2am. The on-call engineer adds a temporary security group rule from the console to restore service. The incident is resolved, the rule is forgotten. Six months later, Terraform thinks that rule does not exist. The next apply removes it — and the same incident fires again.
+------------------------------------------+| Terraform state (what Terraform thinks) || security_group.web: || ingress: 443 from 0.0.0.0/0 || ingress: 80 from 0.0.0.0/0 |+------------------------------------------+ <- someone adds port 22 in console -> +------------------------------------------+| Real AWS resource (what actually exists) || security_group.web: || ingress: 443 from 0.0.0.0/0 || ingress: 80 from 0.0.0.0/0 || ingress: 22 from 0.0.0.0/0 <- DRIFT |+------------------------------------------+Detecting Drift with terraform plan
Every terraform plan performs a refresh by default — it reads real infrastructure and compares to the state file and your configuration. Drift shows up as unexpected changes:
terraform plan # ~ aws_security_group.web will be updated in-place# ~ ingress = [# - {# from_port = 22 <- Terraform will REMOVE this# to_port = 22 <- it is not in your .tf files# protocol = "tcp"# cidr_blocks = ["0.0.0.0/0"]# },# ]## Plan: 0 to add, 1 to change, 0 to destroy.# This "change" is Terraform reverting the manual port 22 ruleThree Ways to Respond to Drift
Option 1 — Revert the drift (apply normally). The manual change was wrong or temporary. Run terraform apply and let Terraform restore the correct state.
terraform apply# Reverts the manual change — removes port 22 from the security groupOption 2 — Accept the drift (update your code). The manual change was intentional and correct. Update your .tf files to include the change, then apply so the config, state, and reality all match.
# Add the port 22 rule to your security group in main.tfingress { description = "SSH from VPN — added after incident-247" from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["10.0.0.0/8"] # lock it down to VPN CIDR, not 0.0.0.0/0}Option 3 — Accept the drift without changing infrastructure (refresh-only). You want to update the state file to record the manual change without making any infrastructure changes.
terraform apply -refresh-only# Updates state to record the port 22 rule without changing real infrastructure# Now Terraform knows about it and will not revert it on the next applyAutomated Drift Detection in CI/CD
Schedule a terraform plan -refresh-only job to run daily. If the plan is non-empty, drift exists — alert the team:
# GitHub Actions — daily drift detectionname: Drift Detectionon: schedule: - cron: "0 9 * * 1-5" # 9am weekdays jobs: detect-drift: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: hashicorp/setup-terraform@v3 with: terraform_version: 1.6.3 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789012:role/terraform-drift-detector aws-region: ap-south-1 - name: Terraform Init run: terraform init - name: Detect Drift id: plan run: | terraform plan -refresh-only -detailed-exitcode # Exit code 0 = no drift # Exit code 1 = error # Exit code 2 = drift detected (changes exist) continue-on-error: true - name: Alert on Drift if: steps.plan.outputs.exitcode == '2' run: | echo "DRIFT DETECTED in production infrastructure" echo "Review the plan output above and reconcile the changes" exit 1 # fail the job to trigger a Slack/email alertPreventing Drift with IAM Policies
The best drift prevention is preventing manual changes from happening in the first place. Use IAM policies to restrict who can modify Terraform-managed resources directly:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Deny", "Action": [ "ec2:ModifyInstanceAttribute", "ec2:AuthorizeSecurityGroupIngress", "ec2:RevokeSecurityGroupIngress" ], "Resource": "*", "Condition": { "StringNotEquals": { "aws:PrincipalArn": [ "arn:aws:iam::123456789012:role/terraform-apply-role", "arn:aws:iam::123456789012:role/break-glass-emergency-role" ] } } } ]}This denies EC2 modifications to everyone except the Terraform apply role and a break-glass emergency role — making drift nearly impossible outside of exceptional circumstances.
Troubleshooting Drift
| Situation | Root Cause | Fix |
|---|---|---|
| Unexpected changes in every plan | Ongoing manual changes by the team | Find who is making console changes, enforce IaC policy |
| Drift reappears after applying | Automated process keeps changing the resource | Identify the automation and either stop it or model it in Terraform |
| Cannot determine what drifted | Too many changes accumulated | Run terraform state show <resource> and compare to console |
| Drift causes apply failures | Manual change put resource in unrecoverable state | Use terraform taint to force recreation, or manually fix then import |
COMMON MISTAKE / WARNING**Common Mistake:** Treating drift as a one-time fix rather than a process failure. If drift keeps appearing, the root cause is a team culture or access control issue — engineers are making changes outside Terraform. Fix the process, not just the state.