What is Drift? | DevOps Dictionary

What Is Infrastructure Drift?

Drift is the gap between what Terraform thinks exists and what actually exists in the cloud.

Terraform's state file is a snapshot taken at the last apply. The moment someone logs into the AWS console and changes a security group rule, adjusts an instance type, or deletes a bucket — without going through Terraform — the real world and the state file diverge. That gap is drift.

Drift happens at the best companies, on the best teams. An incident fires at 2am. The on-call engineer adds a temporary security group rule from the console to restore service. The incident is resolved, the rule is forgotten. Six months later, Terraform thinks that rule does not exist. The next apply removes it — and the same incident fires again.

◈ DIAGRAM

+------------------------------------------+
| Terraform state (what Terraform thinks)  |
| security_group.web:                      |
|   ingress: 443 from 0.0.0.0/0            |
|   ingress: 80  from 0.0.0.0/0            |
+------------------------------------------+
 
     <- someone adds port 22 in console ->
 
+------------------------------------------+
| Real AWS resource (what actually exists) |
| security_group.web:                      |
|   ingress: 443 from 0.0.0.0/0            |
|   ingress: 80  from 0.0.0.0/0            |
|   ingress: 22  from 0.0.0.0/0  <- DRIFT  |
+------------------------------------------+

Detecting Drift with terraform plan

Every terraform plan performs a refresh by default — it reads real infrastructure and compares to the state file and your configuration. Drift shows up as unexpected changes:

Bash

terraform plan
 
# ~ aws_security_group.web will be updated in-place
#   ~ ingress = [
#       - {
#           from_port   = 22             <- Terraform will REMOVE this
#           to_port     = 22             <- it is not in your .tf files
#           protocol    = "tcp"
#           cidr_blocks = ["0.0.0.0/0"]
#         },
#     ]
#
# Plan: 0 to add, 1 to change, 0 to destroy.
# This "change" is Terraform reverting the manual port 22 rule

Three Ways to Respond to Drift

Option 1 — Revert the drift (apply normally). The manual change was wrong or temporary. Run terraform apply and let Terraform restore the correct state.

Bash

terraform apply
# Reverts the manual change — removes port 22 from the security group

Option 2 — Accept the drift (update your code). The manual change was intentional and correct. Update your .tf files to include the change, then apply so the config, state, and reality all match.

PGSQL

# Add the port 22 rule to your security group in main.tf
ingress {
  description = "SSH from VPN — added after incident-247"
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["10.0.0.0/8"]   # lock it down to VPN CIDR, not 0.0.0.0/0
}

Option 3 — Accept the drift without changing infrastructure (refresh-only). You want to update the state file to record the manual change without making any infrastructure changes.

Bash

terraform apply -refresh-only
# Updates state to record the port 22 rule without changing real infrastructure
# Now Terraform knows about it and will not revert it on the next apply

Automated Drift Detection in CI/CD

Schedule a terraform plan -refresh-only job to run daily. If the plan is non-empty, drift exists — alert the team:

YAML

# GitHub Actions — daily drift detection
name: Drift Detection
on:
  schedule:
    - cron: "0 9 * * 1-5"   # 9am weekdays
 
jobs:
  detect-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.3
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/terraform-drift-detector
          aws-region: ap-south-1
 
      - name: Terraform Init
        run: terraform init
 
      - name: Detect Drift
        id: plan
        run: |
          terraform plan -refresh-only -detailed-exitcode
        # Exit code 0 = no drift
        # Exit code 1 = error
        # Exit code 2 = drift detected (changes exist)
        continue-on-error: true
 
      - name: Alert on Drift
        if: steps.plan.outputs.exitcode == '2'
        run: |
          echo "DRIFT DETECTED in production infrastructure"
          echo "Review the plan output above and reconcile the changes"
          exit 1   # fail the job to trigger a Slack/email alert

Preventing Drift with IAM Policies

The best drift prevention is preventing manual changes from happening in the first place. Use IAM policies to restrict who can modify Terraform-managed resources directly:

JSON

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": [
        "ec2:ModifyInstanceAttribute",
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:RevokeSecurityGroupIngress"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalArn": [
            "arn:aws:iam::123456789012:role/terraform-apply-role",
            "arn:aws:iam::123456789012:role/break-glass-emergency-role"
          ]
        }
      }
    }
  ]
}

This denies EC2 modifications to everyone except the Terraform apply role and a break-glass emergency role — making drift nearly impossible outside of exceptional circumstances.

Troubleshooting Drift

Situation	Root Cause	Fix
Unexpected changes in every plan	Ongoing manual changes by the team	Find who is making console changes, enforce IaC policy
Drift reappears after applying	Automated process keeps changing the resource	Identify the automation and either stop it or model it in Terraform
Cannot determine what drifted	Too many changes accumulated	Run `terraform state show <resource>` and compare to console
Drift causes apply failures	Manual change put resource in unrecoverable state	Use `terraform taint` to force recreation, or manually fix then import

COMMON MISTAKE / WARNING
**Common Mistake:** Treating drift as a one-time fix rather than a process failure. If drift keeps appearing, the root cause is a team culture or access control issue — engineers are making changes outside Terraform. Fix the process, not just the state.