What is the career path for learning Terraform at Scale — Code Organisation, Workspaces, and Terragrunt?

Mastering Terraform at Scale — Code Organisation, Workspaces, and Terragrunt enables engineering opportunities in DevOps, SRE, and cloud platform automation.

Terraform at Scale — Code Organisation, Workspaces, and Terragrunt | DevOps Network

Q: How long does it take to learn Terraform at Scale — Code Organisation, Workspaces, and Terragrunt?

Most students gain core proficiency in Terraform at Scale — Code Organisation, Workspaces, and Terragrunt in 2–3 weeks of active hands-on labs.

Overview and What You Will Learn

When your Terraform codebase grows from 200 lines managing one environment to 20,000 lines across five environments and ten engineers, the question stops being "how do I write a resource block" and becomes "how do I stop dev, staging, and production from stepping on each other." This lab covers the three real options teams use to solve that — Terraform workspaces, directory-per-environment, and Terragrunt — and, more importantly, when each one actually fits and when it quietly creates new problems.

By the end of this lab you will be able to:

Use Terraform workspaces correctly, and recognise the situations where they are the wrong tool
Structure a directory-per-environment layout that scales to many environments and many engineers
Manage per-environment variables cleanly with separate .tfvars files
Decide between a mono-repo and a multi-repo structure for your infrastructure code
Use Terragrunt to eliminate repeated backend configuration across environments
Choose confidently between native workspaces and Terragrunt for your own team

Why This Matters in Production

At Hotstar, the infrastructure team initially used Terraform workspaces to separate dev, staging, and production. It worked fine for about eight months. Then an engineer, working late during the IPL season traffic surge, ran terraform apply while in the wrong workspace — the production AWS credentials were configured globally, but the active workspace was still set to staging from earlier that day. The plan looked correct because the state was genuinely consistent — just consistent with the wrong environment. Nothing in the command output screamed "you are about to touch production" loudly enough. After that incident, the team migrated to a directory-per-environment structure, where each environment lives in a physically separate folder with its own backend configuration. You cannot accidentally apply to the wrong environment when wrong environment is a different folder you would have had to deliberately cd into.

Core Principles

Directory-Per-Environment Structure

Bash

infrastructure/
├── modules/                       <-- shared, reusable modules (no environment-specific values)
│   ├── vpc/
│   ├── ecs-service/
│   └── rds-postgres/
│
├── environments/
│   ├── dev/
│   │   ├── main.tf                <-- calls modules, passes dev-specific values
│   │   ├── backend.tf             <-- dev's own remote state bucket/key
│   │   └── terraform.tfvars
│   │
│   ├── staging/
│   │   ├── main.tf
│   │   ├── backend.tf             <-- DIFFERENT state file from dev
│   │   └── terraform.tfvars
│   │
│   └── prod/
│       ├── main.tf
│       ├── backend.tf             <-- DIFFERENT state file from staging
│       └── terraform.tfvars

Each environment is a fully separate Terraform working directory with its own state file. To touch production, you must cd environments/prod — there is no shared "current workspace" switch to get wrong.

Detailed Step-by-Step Practical Lab

Step 1 — Understand What Terraform Workspaces Actually Are

A workspace is a named slot for a separate state file within the SAME backend configuration. It is not a separate AWS account, a separate set of credentials, or a separate set of variables by default — it is purely a different state file:

Bash

terraform workspace list
# * default
 
terraform workspace new staging
# Created and switched to workspace "staging"!
 
terraform workspace new prod
# Created and switched to workspace "prod"!
 
terraform workspace list
#   default
#   staging
# * prod          <-- the asterisk shows which workspace is currently active
 
terraform workspace select staging
# Switched to workspace "staging"

NGINX

# Referencing the active workspace name inside your config
resource "aws_instance" "app" {
  instance_type = terraform.workspace == "prod" ? "t3.large" : "t3.micro"
  #               ^^^^^^^^^^^^^^^^^^^ built-in variable — current workspace name
}

Step 2 — Recognise Workspace Limitations

INFORMATION
Security note: workspaces share the same backend and, in most setups, the same AWS credentials and IAM permissions. There is no built-in guardrail stopping someone in the `dev` workspace from having full write access to whatever the `prod` workspace's state points at. The separation is purely a state-file label, not an access boundary.

TEXT

What workspaces DO separate:        What workspaces DO NOT separate:
- The state file itself             - AWS credentials/IAM permissions
- Resources tracked in that state   - Which AWS account you're targeting
                                     - Variable files (you must build this yourself)
                                     - Backend bucket/region (same for all workspaces)

Step 3 — Set Up Per-Environment Variable Files

Whether using workspaces or separate directories, keep environment-specific values in their own .tfvars file:

INI

# environments/dev/terraform.tfvars
instance_type   = "t3.micro"
instance_count  = 1
db_instance_class = "db.t3.micro"

INI

# environments/prod/terraform.tfvars
instance_type   = "t3.large"
instance_count  = 4
db_instance_class = "db.r6g.large"

Bash

# Applying with explicit var-file makes the target environment unambiguous on every command
cd environments/prod
terraform plan -var-file=terraform.tfvars
terraform apply -var-file=terraform.tfvars

Step 4 — Mono-Repo vs Multi-Repo

◈ DIAGRAM

MONO-REPO                              MULTI-REPO
infrastructure/                        infra-networking/   (own repo)
├── modules/                           infra-compute/       (own repo)
├── environments/dev/                  infra-databases/     (own repo)
├── environments/staging/              infra-security/      (own repo)
└── environments/prod/
 
Good for: small-to-mid teams,          Good for: large orgs with separate
single platform team owning            teams owning different layers,
all infrastructure                     strict access control per repo

INFORMATION
Tip from a senior engineer: start with a mono-repo. Splitting into multiple repos adds real coordination overhead — cross-repo module versioning, separate CI pipelines, separate access requests. Only split when a genuinely different team needs to own a different layer with different access rules, not just because the repo "feels big."

Step 5 — Eliminate Backend Duplication with Terragrunt

The biggest pain in a directory-per-environment layout is that the backend.tf block in dev, staging, and prod is almost identical — just the bucket key differs. Terragrunt removes this duplication:

MIPSASM

# terragrunt.hcl at the ROOT of the repo — defines the backend ONCE
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }
  config = {
    bucket         = "razorpay-terraform-state"
    key            = "${path_relative_to_include()}/terraform.tfstate"   # auto-generated per folder
    region         = "ap-south-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

NGINX

# environments/prod/terragrunt.hcl — each environment just declares its inputs
include "root" {
  path = find_in_parent_folders()   # inherits the remote_state block above automatically
}
 
terraform {
  source = "../../modules//ecs-service"
}
 
inputs = {
  service_name = "orders-api"
  cpu          = 1024
  environment  = "prod"
}

Bash

# Terragrunt wraps terraform commands and auto-generates the backend.tf for you
cd environments/prod
terragrunt plan
terragrunt apply
 
# run-all applies EVERY environment under the current directory in dependency order
terragrunt run-all apply

Step 6 — Dependency Management Across Modules with Terragrunt

PGSQL

# environments/prod/ecs-service/terragrunt.hcl
dependency "vpc" {
  config_path = "../vpc"   # points to another Terragrunt module's directory
}
 
inputs = {
  vpc_id     = dependency.vpc.outputs.vpc_id        # reads vpc's output automatically
  subnet_ids = dependency.vpc.outputs.private_subnets
}

Terragrunt resolves this dependency graph and runs vpc before ecs-service automatically when you use run-all apply — no manual ordering required.

Production Best Practices and Common Pitfalls

Never rely on workspaces alone to protect production. If you use workspaces, pair them with separate AWS credentials per workspace (via separate AWS profiles or assumed roles) — never let the same credentials have write access to every workspace's resources.
Make the active environment visible in your terminal prompt. Whether using workspaces or directories, a misconfigured shell prompt that doesn't show the current workspace or directory is how the Hotstar-style incident happens. Many teams add the workspace name to their shell prompt (PS1) specifically to prevent this.
Don't reach for Terragrunt just because it's popular. If you only have two or three environments and one small team, the native directory-per-environment pattern with plain Terraform is often simpler to onboard new engineers to than learning Terragrunt's own DSL on top of HCL.
Use Terragrunt when dependency chains get deep. If you have ten interdependent modules across five environments and you are manually tracking apply order in a runbook, that is exactly the pain Terragrunt's dependency blocks and run-all are built to remove.

Quick Reference and Troubleshooting Commands

Approach	Best For	Watch Out For
Terraform workspaces	Quick environment splits, small teams	No real access boundary between workspaces
Directory-per-environment	Most teams — clearest mental model	Some duplication in backend config across folders
Terragrunt	Many environments, deep module dependency chains	Extra tool and DSL for new engineers to learn

Command	What It Does
`terraform workspace show`	Print the currently active workspace
`terraform workspace select <name>`	Switch active workspace
`terragrunt run-all plan`	Plan every Terragrunt module under the current directory
`terragrunt graph-dependencies`	Print the resolved dependency order across modules

Error	Root Cause	Fix
Applied to the wrong environment	Active workspace was not what the engineer assumed	Run `terraform workspace show` before every apply, or switch to directory-per-environment
`Error: Could not load plugins` in Terragrunt	Terragrunt and Terraform version mismatch	Check Terragrunt's compatibility matrix for your installed Terraform version
`dependency.x.outputs` is empty	Dependency module hasn't been applied yet	Run `terragrunt run-all apply` so dependencies apply in the correct order first
Duplicate `backend.tf` drift between environments	Manual edits to one environment's backend file, not the shared Terragrunt config	Move backend config into the root `terragrunt.hcl` so it generates consistently everywhere

Terraform at Scale — Code Organisation, Workspaces, and Terragrunt

Overview and What You Will Learn

Why This Matters in Production

Core Principles

Directory-Per-Environment Structure

Detailed Step-by-Step Practical Lab

Step 1 — Understand What Terraform Workspaces Actually Are

Step 2 — Recognise Workspace Limitations

Step 3 — Set Up Per-Environment Variable Files

Step 4 — Mono-Repo vs Multi-Repo

Step 5 — Eliminate Backend Duplication with Terragrunt

Step 6 — Dependency Management Across Modules with Terragrunt

Production Best Practices and Common Pitfalls

Quick Reference and Troubleshooting Commands

Resources

Explore More in Terraform Modules and Code Organisation

Writing Reusable Terraform Modules — Structure, Variables, and Outputs

Terraform Registry — Using Community Modules and Publishing Your Own