Overview and What You Will Learn
Local state — a terraform.tfstate file sitting on your laptop — works fine when you are the only engineer. The moment a second engineer exists, local state becomes dangerous. They have a different copy, or no copy at all. Two different state files means two different views of reality. Concurrent applies corrupt infrastructure.
Remote state on S3 with DynamoDB locking solves this completely. The state file lives in S3. Every engineer and every CI/CD pipeline reads the same file. DynamoDB ensures only one apply runs at a time. S3 versioning means any accidental corruption is recoverable.
This is the production standard for Terraform on AWS. By the end of this lab you will have:
- Created the S3 bucket and DynamoDB table for state storage and locking
- Configured the S3 backend in your Terraform configuration
- Migrated existing local state to the remote backend
- Understood partial backend configuration for multi-environment setups
- Set up correct IAM permissions for the Terraform apply role
Why This Matters in Production
At Swiggy, before adopting remote state, two engineers applying to the same environment caused a state file corruption that took half a day to recover from. One engineer had the "correct" state. The other had an older copy. When both applied, the second write overwrote the first — resources were orphaned and the state no longer matched reality.
After moving to S3 remote state with DynamoDB locking, the problem became structurally impossible. The first apply acquires the DynamoDB lock. The second engineer's apply sees the lock and waits. When the first completes, the lock is released and the second can proceed safely — reading the updated state file.
Core Principles
The Remote State Architecture
+------------------------------------------+| Engineer A laptop || terraform apply || 1. Read state from S3 || 2. Acquire DynamoDB lock || 3. Make infrastructure changes || 4. Write new state to S3 || 5. Release DynamoDB lock |+------------------------------------------+ | | v v+---------------+ +----------------------------+| S3 Bucket | | DynamoDB Table || terraform. | | LockID: s3/bucket/key.json || tfstate | | Info: who, when, operation || (versioned, | | || encrypted) | | |+---------------+ +----------------------------+ ^ |+------------------------------------------+| Engineer B laptop || terraform apply || 1. Read state from S3 || 2. Acquire DynamoDB lock -- BLOCKED || (Engineer A holds it) || 3. Waits until lock is released |+------------------------------------------+Detailed Step-by-Step Practical Lab
Step 1 — Create the Bootstrap Resources
You need an S3 bucket and a DynamoDB table before you can configure the remote backend. These are created once — manually or with a separate bootstrap configuration that you never store in the state bucket itself.
# Create the S3 bucket for state storageaws s3api create-bucket \ --bucket razorpay-terraform-state-ap-south-1 \ --region ap-south-1 \ --create-bucket-configuration LocationConstraint=ap-south-1 # Enable versioning — recover any previous state version if corruption occursaws s3api put-bucket-versioning \ --bucket razorpay-terraform-state-ap-south-1 \ --versioning-configuration Status=Enabled # Enable server-side encryption — state contains secrets in plaintextaws s3api put-bucket-encryption \ --bucket razorpay-terraform-state-ap-south-1 \ --server-side-encryption-configuration '{ "Rules": [{ "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "aws:kms" } }] }' # Block all public access — state must never be publicly readableaws s3api put-public-access-block \ --bucket razorpay-terraform-state-ap-south-1 \ --public-access-block-configuration \ "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true" # Create the DynamoDB table for lockingaws dynamodb create-table \ --table-name terraform-state-lock \ --attribute-definitions AttributeName=LockID,AttributeType=S \ --key-schema AttributeName=LockID,KeyType=HASH \ --billing-mode PAY_PER_REQUEST \ --region ap-south-1Alternatively, manage the bootstrap resources as Terraform — in a separate configuration that stores its own state locally (bootstrapping the bootstrapper):
# bootstrap/main.tf — run this ONCE with local state# Do NOT configure an S3 backend for this configuration terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } # No backend block — local state for the bootstrap config} provider "aws" { region = "ap-south-1"} resource "aws_s3_bucket" "terraform_state" { bucket = "razorpay-terraform-state-ap-south-1" lifecycle { prevent_destroy = true # never accidentally delete this bucket } tags = { Purpose = "terraform-remote-state" ManagedBy = "terraform-bootstrap" }} resource "aws_s3_bucket_versioning" "terraform_state" { bucket = aws_s3_bucket.terraform_state.id versioning_configuration { status = "Enabled" }} resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" { bucket = aws_s3_bucket.terraform_state.id rule { apply_server_side_encryption_by_default { sse_algorithm = "aws:kms" } bucket_key_enabled = true # reduces KMS API calls and cost }} resource "aws_s3_bucket_public_access_block" "terraform_state" { bucket = aws_s3_bucket.terraform_state.id block_public_acls = true block_public_policy = true ignore_public_acls = true restrict_public_buckets = true} resource "aws_s3_bucket_policy" "terraform_state" { bucket = aws_s3_bucket.terraform_state.id policy = jsonencode({ Version = "2012-10-17" Statement = [ { Effect = "Deny" Principal = "*" Action = "s3:*" Resource = [ aws_s3_bucket.terraform_state.arn, "${aws_s3_bucket.terraform_state.arn}/*" ] Condition = { Bool = { "aws:SecureTransport" = "false" # deny unencrypted HTTP access } } } ] })} resource "aws_dynamodb_table" "terraform_locks" { name = "terraform-state-lock" billing_mode = "PAY_PER_REQUEST" hash_key = "LockID" # exact string Terraform expects attribute { name = "LockID" type = "S" } lifecycle { prevent_destroy = true } tags = { Purpose = "terraform-state-locking" ManagedBy = "terraform-bootstrap" }} output "state_bucket_name" { value = aws_s3_bucket.terraform_state.id} output "lock_table_name" { value = aws_dynamodb_table.terraform_locks.id}cd bootstrap/terraform initterraform apply# Note the outputs — you will use them in the backend configurationStep 2 — Configure the S3 Backend
Add the backend block to your main configuration's versions.tf:
# versions.tfterraform { required_version = ">= 1.6.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } backend "s3" { # The S3 bucket created in Step 1 bucket = "razorpay-terraform-state-ap-south-1" # The path inside the bucket — use a hierarchy for multiple environments # prod/payments-api/terraform.tfstate # staging/payments-api/terraform.tfstate key = "prod/payments-api/terraform.tfstate" region = "ap-south-1" # The DynamoDB table created in Step 1 dynamodb_table = "terraform-state-lock" # Encrypt the state file in transit (redundant with bucket encryption # but adds an extra safety check at the Terraform level) encrypt = true }}Step 3 — Run terraform init to Migrate State
After adding the backend block, run terraform init. Terraform detects the new backend and offers to migrate any existing local state:
terraform init # Initializing the backend...## Terraform detected that the backend type changed from "local" to "s3"## Do you want to copy existing state to the new backend?# Pre-existing state was found while migrating the previous backend.# An existing non-empty state already exists in the target. The two# states have been merged. Would you like to copy this state to the new backend?# Enter a value: yes## Successfully configured the backend "s3"!# Terraform will automatically use this backend unless the backend# configuration changes.## Initializing provider plugins...# - hashicorp/aws: Using previously-installed hashicorp/aws v5.31.0## Terraform has been successfully initialized! # Verify the state is now in S3aws s3 ls s3://razorpay-terraform-state-ap-south-1/prod/payments-api/# 2024-01-15 09:15:32 4521 terraform.tfstateStep 4 — Set Up IAM Permissions
The IAM role or user that runs Terraform needs specific permissions on the S3 bucket and DynamoDB table:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "TerraformStateS3Access", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:GetBucketVersioning", "s3:GetEncryptionConfiguration" ], "Resource": [ "arn:aws:s3:::razorpay-terraform-state-ap-south-1", "arn:aws:s3:::razorpay-terraform-state-ap-south-1/*" ] }, { "Sid": "TerraformStateDynamoDBLocking", "Effect": "Allow", "Action": [ "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:DeleteItem", "dynamodb:DescribeTable" ], "Resource": "arn:aws:dynamodb:ap-south-1:123456789012:table/terraform-state-lock" } ]}Step 5 — Partial Backend Configuration for Multiple Environments
Backend blocks cannot use variables — all values must be hardcoded. For multiple environments sharing the same bucket but with different state keys, use partial backend configuration:
# versions.tf — partial backend config (no key specified)terraform { backend "s3" { bucket = "razorpay-terraform-state-ap-south-1" region = "ap-south-1" dynamodb_table = "terraform-state-lock" encrypt = true # key is NOT specified here — passed at init time }}# Dev environmentterraform init \ -backend-config="key=dev/payments-api/terraform.tfstate" # Staging environmentterraform init \ -backend-config="key=staging/payments-api/terraform.tfstate" # Production environmentterraform init \ -backend-config="key=prod/payments-api/terraform.tfstate"This pattern is especially useful in CI/CD pipelines where the environment name is a pipeline variable.
Step 6 — Verify the Setup is Working
# Run a plan — should show no changes and read state from S3 cleanlyterraform plan# Acquiring state lock. This may take a few moments...# (reads state from S3, acquires DynamoDB lock for the plan)# No changes. Your infrastructure matches the configuration.# Releasing state lock. This may take a few moments... # Confirm the lock was created and released during the plan# Check DynamoDB directly — the lock item should NOT exist between operationsaws dynamodb scan --table-name terraform-state-lock# Count: 0 <- no locks held (apply is not running) # Check S3 for state file and versionsaws s3api list-object-versions \ --bucket razorpay-terraform-state-ap-south-1 \ --prefix prod/payments-api/terraform.tfstate# Shows multiple versions — confirms versioning is workingProduction Best Practices and Common Pitfalls
Use a separate AWS account for state storage. The state bucket contains state for all your environments. Keeping it in a dedicated tools or shared-services account prevents a compromised dev account from exposing prod state. Terraform assumes the state bucket's account role using cross-account IAM.
Key naming convention matters. Use a consistent key hierarchy:
<environment>/<service>/terraform.tfstate. This makes state files easy to find, easy to restrict with IAM conditions, and easy to audit. A flat structure liketerraform.tfstatefor every service causes collisions and confusion.Enable S3 access logging on the state bucket. Every read and write to the state file is an infrastructure event. S3 access logs let you audit who accessed state and when — valuable during incident response.
Add
prevent_destroyto the state bucket resource. The state bucket is the most critical single resource in your infrastructure setup. One accidentalterraform destroywould orphan all managed infrastructure.lifecycle { prevent_destroy = true }makes this impossible.Never share the same DynamoDB lock table across different S3 buckets. The lock key includes the full bucket and path — one DynamoDB table can safely serve multiple state files in the same bucket. But mixing state from different buckets in one lock table creates confusion when debugging stuck locks.
Test state recovery before you need it. Use S3 versioning to restore a previous state version: download an older version, review it, then push it back. Do this in a dev environment once so you know the procedure when a production emergency happens.
Quick Reference and Troubleshooting Commands
| Command | What It Does |
|---|---|
terraform init |
Initialise backend — migrates local state if present |
terraform init -reconfigure |
Force backend reconfiguration without migration prompt |
terraform init -migrate-state |
Migrate state from old backend to new backend explicitly |
terraform init -backend-config="key=..." |
Pass backend config at init time — for partial config pattern |
terraform state pull |
Download current remote state to stdout |
terraform state push <file> |
Upload local state file to remote backend |
terraform force-unlock <id> |
Break a stuck DynamoDB lock |
| Error | Root Cause | Fix |
|---|---|---|
NoSuchBucket |
S3 bucket does not exist | Create the bucket before running terraform init |
AccessDenied on S3 |
Missing S3 IAM permissions | Add s3:GetObject, s3:PutObject, s3:ListBucket |
ResourceNotFoundException |
DynamoDB table does not exist | Create the table with LockID as partition key |
Error: Backend configuration changed |
Backend block was modified after init | Run terraform init -reconfigure |
Variables not allowed in backend config |
Used var.x inside backend block |
Backend blocks only accept literals — use -backend-config flag for dynamic values |
Error: Failed to get existing workspaces |
Wrong region or IAM permissions | Check region in backend config matches bucket region |