Overview and What You Will Learn
When your Terraform codebase grows from 200 lines managing one environment to 20,000 lines across five environments and ten engineers, the question stops being "how do I write a resource block" and becomes "how do I stop dev, staging, and production from stepping on each other." This lab covers the three real options teams use to solve that — Terraform workspaces, directory-per-environment, and Terragrunt — and, more importantly, when each one actually fits and when it quietly creates new problems.
By the end of this lab you will be able to:
- Use Terraform workspaces correctly, and recognise the situations where they are the wrong tool
- Structure a directory-per-environment layout that scales to many environments and many engineers
- Manage per-environment variables cleanly with separate
.tfvarsfiles - Decide between a mono-repo and a multi-repo structure for your infrastructure code
- Use Terragrunt to eliminate repeated backend configuration across environments
- Choose confidently between native workspaces and Terragrunt for your own team
Why This Matters in Production
At Hotstar, the infrastructure team initially used Terraform workspaces to separate dev, staging, and production. It worked fine for about eight months. Then an engineer, working late during the IPL season traffic surge, ran terraform apply while in the wrong workspace — the production AWS credentials were configured globally, but the active workspace was still set to staging from earlier that day. The plan looked correct because the state was genuinely consistent — just consistent with the wrong environment. Nothing in the command output screamed "you are about to touch production" loudly enough. After that incident, the team migrated to a directory-per-environment structure, where each environment lives in a physically separate folder with its own backend configuration. You cannot accidentally apply to the wrong environment when wrong environment is a different folder you would have had to deliberately cd into.
Core Principles
Directory-Per-Environment Structure
infrastructure/├── modules/ <-- shared, reusable modules (no environment-specific values)│ ├── vpc/│ ├── ecs-service/│ └── rds-postgres/│├── environments/│ ├── dev/│ │ ├── main.tf <-- calls modules, passes dev-specific values│ │ ├── backend.tf <-- dev's own remote state bucket/key│ │ └── terraform.tfvars│ ││ ├── staging/│ │ ├── main.tf│ │ ├── backend.tf <-- DIFFERENT state file from dev│ │ └── terraform.tfvars│ ││ └── prod/│ ├── main.tf│ ├── backend.tf <-- DIFFERENT state file from staging│ └── terraform.tfvarsEach environment is a fully separate Terraform working directory with its own state file. To touch production, you must cd environments/prod — there is no shared "current workspace" switch to get wrong.
Detailed Step-by-Step Practical Lab
Step 1 — Understand What Terraform Workspaces Actually Are
A workspace is a named slot for a separate state file within the SAME backend configuration. It is not a separate AWS account, a separate set of credentials, or a separate set of variables by default — it is purely a different state file:
terraform workspace list# * default terraform workspace new staging# Created and switched to workspace "staging"! terraform workspace new prod# Created and switched to workspace "prod"! terraform workspace list# default# staging# * prod <-- the asterisk shows which workspace is currently active terraform workspace select staging# Switched to workspace "staging"# Referencing the active workspace name inside your configresource "aws_instance" "app" { instance_type = terraform.workspace == "prod" ? "t3.large" : "t3.micro" # ^^^^^^^^^^^^^^^^^^^ built-in variable — current workspace name}Step 2 — Recognise Workspace Limitations
INFORMATIONSecurity note: workspaces share the same backend and, in most setups, the same AWS credentials and IAM permissions. There is no built-in guardrail stopping someone in the `dev` workspace from having full write access to whatever the `prod` workspace's state points at. The separation is purely a state-file label, not an access boundary.
What workspaces DO separate: What workspaces DO NOT separate:- The state file itself - AWS credentials/IAM permissions- Resources tracked in that state - Which AWS account you're targeting - Variable files (you must build this yourself) - Backend bucket/region (same for all workspaces)Step 3 — Set Up Per-Environment Variable Files
Whether using workspaces or separate directories, keep environment-specific values in their own .tfvars file:
# environments/dev/terraform.tfvarsinstance_type = "t3.micro"instance_count = 1db_instance_class = "db.t3.micro"# environments/prod/terraform.tfvarsinstance_type = "t3.large"instance_count = 4db_instance_class = "db.r6g.large"# Applying with explicit var-file makes the target environment unambiguous on every commandcd environments/prodterraform plan -var-file=terraform.tfvarsterraform apply -var-file=terraform.tfvarsStep 4 — Mono-Repo vs Multi-Repo
MONO-REPO MULTI-REPOinfrastructure/ infra-networking/ (own repo)├── modules/ infra-compute/ (own repo)├── environments/dev/ infra-databases/ (own repo)├── environments/staging/ infra-security/ (own repo)└── environments/prod/ Good for: small-to-mid teams, Good for: large orgs with separatesingle platform team owning teams owning different layers,all infrastructure strict access control per repoINFORMATIONTip from a senior engineer: start with a mono-repo. Splitting into multiple repos adds real coordination overhead — cross-repo module versioning, separate CI pipelines, separate access requests. Only split when a genuinely different team needs to own a different layer with different access rules, not just because the repo "feels big."
Step 5 — Eliminate Backend Duplication with Terragrunt
The biggest pain in a directory-per-environment layout is that the backend.tf block in dev, staging, and prod is almost identical — just the bucket key differs. Terragrunt removes this duplication:
# terragrunt.hcl at the ROOT of the repo — defines the backend ONCEremote_state { backend = "s3" generate = { path = "backend.tf" if_exists = "overwrite" } config = { bucket = "razorpay-terraform-state" key = "${path_relative_to_include()}/terraform.tfstate" # auto-generated per folder region = "ap-south-1" dynamodb_table = "terraform-state-lock" encrypt = true }}# environments/prod/terragrunt.hcl — each environment just declares its inputsinclude "root" { path = find_in_parent_folders() # inherits the remote_state block above automatically} terraform { source = "../../modules//ecs-service"} inputs = { service_name = "orders-api" cpu = 1024 environment = "prod"}# Terragrunt wraps terraform commands and auto-generates the backend.tf for youcd environments/prodterragrunt planterragrunt apply # run-all applies EVERY environment under the current directory in dependency orderterragrunt run-all applyStep 6 — Dependency Management Across Modules with Terragrunt
# environments/prod/ecs-service/terragrunt.hcldependency "vpc" { config_path = "../vpc" # points to another Terragrunt module's directory} inputs = { vpc_id = dependency.vpc.outputs.vpc_id # reads vpc's output automatically subnet_ids = dependency.vpc.outputs.private_subnets}Terragrunt resolves this dependency graph and runs vpc before ecs-service automatically when you use run-all apply — no manual ordering required.
Production Best Practices and Common Pitfalls
Never rely on workspaces alone to protect production. If you use workspaces, pair them with separate AWS credentials per workspace (via separate AWS profiles or assumed roles) — never let the same credentials have write access to every workspace's resources.
Make the active environment visible in your terminal prompt. Whether using workspaces or directories, a misconfigured shell prompt that doesn't show the current workspace or directory is how the Hotstar-style incident happens. Many teams add the workspace name to their shell prompt (
PS1) specifically to prevent this.Don't reach for Terragrunt just because it's popular. If you only have two or three environments and one small team, the native directory-per-environment pattern with plain Terraform is often simpler to onboard new engineers to than learning Terragrunt's own DSL on top of HCL.
Use Terragrunt when dependency chains get deep. If you have ten interdependent modules across five environments and you are manually tracking apply order in a runbook, that is exactly the pain Terragrunt's
dependencyblocks andrun-allare built to remove.
Quick Reference and Troubleshooting Commands
| Approach | Best For | Watch Out For |
|---|---|---|
| Terraform workspaces | Quick environment splits, small teams | No real access boundary between workspaces |
| Directory-per-environment | Most teams — clearest mental model | Some duplication in backend config across folders |
| Terragrunt | Many environments, deep module dependency chains | Extra tool and DSL for new engineers to learn |
| Command | What It Does |
|---|---|
terraform workspace show |
Print the currently active workspace |
terraform workspace select <name> |
Switch active workspace |
terragrunt run-all plan |
Plan every Terragrunt module under the current directory |
terragrunt graph-dependencies |
Print the resolved dependency order across modules |
| Error | Root Cause | Fix |
|---|---|---|
| Applied to the wrong environment | Active workspace was not what the engineer assumed | Run terraform workspace show before every apply, or switch to directory-per-environment |
Error: Could not load plugins in Terragrunt |
Terragrunt and Terraform version mismatch | Check Terragrunt's compatibility matrix for your installed Terraform version |
dependency.x.outputs is empty |
Dependency module hasn't been applied yet | Run terragrunt run-all apply so dependencies apply in the correct order first |
Duplicate backend.tf drift between environments |
Manual edits to one environment's backend file, not the shared Terragrunt config | Move backend config into the root terragrunt.hcl so it generates consistently everywhere |