Overview and What You Will Learn
Manually running terraform apply from your laptop is fine when you are the only engineer touching the infrastructure. The moment your team grows to three people, it stops being fine — two engineers can apply at the same time without realising it, nobody has a record of who changed what and why, and the only way to review a change is to trust that whoever ran it copy-pasted the real plan output into Slack. A CI/CD pipeline for Terraform GitHub Actions fixes all three problems at once: every change goes through a pull request, every plan is visible to every reviewer, and apply only ever runs from one consistent, audited environment — never from anyone's laptop.
By the end of this lab you will be able to:
- Build a GitHub Actions workflow that runs
terraform planautomatically on every pull request - Authenticate to AWS using OIDC — with zero long-lived access keys stored anywhere
- Post the plan output as a readable PR comment for reviewers
- Run
terraform applyautomatically on merge to main, gated behind an environment protection rule - Cache Terraform providers to keep pipeline runs fast
- Add
terraform fmt,tflint, and a security scanner to the pipeline before any plan runs
Why This Matters in Production
At PhonePe, before the platform team automated Terraform, every infrastructure change followed the same informal ritual: an engineer would run terraform plan locally, paste the output into a Slack thread, wait for a thumbs-up emoji from a teammate, then run terraform apply from their own laptop. This worked until it didn't — an engineer once applied a change while disconnected from the office VPN, against credentials cached from a different AWS profile than the one shown in the Slack-pasted plan. The infrastructure that got created did not match what anyone had reviewed. After that incident, every apply in the company runs exclusively through GitHub Actions, using a service identity with OIDC — no engineer's laptop credentials can reach production Terraform state anymore, by design, not by policy memo.
Core Principles
The Terraform GitHub Actions Pipeline
+------------------------------------------------+| 1. Engineer opens PR changing .tf files |+------------------------------------------------+ | triggers on: pull_request | v+------------------------------------------------+| 2. GitHub Actions job: || - terraform fmt -check || - terraform validate || - tflint || - terraform plan (using OIDC-assumed role) || - posts plan output as PR comment |+------------------------------------------------+ | reviewer reads plan, approves PR | v+------------------------------------------------+| 3. PR merged to main branch |+------------------------------------------------+ | triggers on: push to main | v+------------------------------------------------+| 4. GitHub Actions job: || - environment protection gate (manual approval) || - terraform apply (using OIDC-assumed role) |+------------------------------------------------+Detailed Step-by-Step Practical Lab
Step 1 — Set Up OIDC Trust Between GitHub and AWS
OIDC lets GitHub Actions request short-lived AWS credentials directly — no access key or secret key is ever stored as a GitHub secret. First, create the trust relationship in AWS:
# This is Terraform creating the OIDC provider and role THAT YOUR PIPELINE WILL USE# Run this once, manually or via a bootstrap pipeline, before the main pipeline exists resource "aws_iam_openid_connect_provider" "github_actions" { url = "https://token.actions.githubusercontent.com" client_id_list = ["sts.amazonaws.com"] thumbprint_list = ["6938fd4d98bab03faadb97b34396831e3780aea1"]} resource "aws_iam_role" "github_actions_terraform" { name = "github-actions-terraform-deploy" assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [{ Effect = "Allow" Principal = { Federated = aws_iam_openid_connect_provider.github_actions.arn } Action = "sts:AssumeRoleWithWebIdentity" Condition = { StringEquals = { "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com" } StringLike = { # restricts WHICH repo and branch can assume this role — never leave this wide open "token.actions.githubusercontent.com:sub" = "repo:phonepay-platform/infrastructure:*" } } }] })}INFORMATIONSecurity note: that `StringLike` condition is the entire security boundary of OIDC. If you write `"repo:*:*"` instead of your exact repository name, you have just granted every public GitHub Actions workflow on Earth the ability to assume this role. Always scope it to your exact org/repo, and to specific branches if `apply` permissions need to be tighter than `plan` permissions.
Step 2 — Write the Plan Workflow (Runs on Every PR)
# .github/workflows/terraform-plan.ymlname: Terraform Plan on: pull_request: paths: ["environments/**", "modules/**"] # only run when relevant files change permissions: id-token: write # required for OIDC — GitHub issues the identity token contents: read pull-requests: write # required to post the plan as a PR comment jobs: plan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Configure AWS credentials via OIDC uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789012:role/github-actions-terraform-deploy aws-region: ap-south-1 # NOTE: no access-key-id or secret-access-key anywhere — OIDC handles it - uses: hashicorp/setup-terraform@v3 with: terraform_version: 1.7.5 - name: Terraform Format Check run: terraform fmt -check -recursive working-directory: environments/prod - name: Terraform Init run: terraform init working-directory: environments/prod - name: Terraform Validate run: terraform validate working-directory: environments/prod - name: Terraform Plan id: plan run: terraform plan -no-color -out=tfplan working-directory: environments/prod continue-on-error: true # capture the failure, post it as a comment, THEN fail the job - name: Post Plan as PR Comment uses: actions/github-script@v7 with: script: | const output = `#### Terraform Plan Result \`\`\` ${{ steps.plan.outputs.stdout }} \`\`\` `; github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: output }); - name: Fail Job if Plan Failed if: steps.plan.outcome == 'failure' run: exit 1Step 3 — Write the Apply Workflow (Runs on Merge to Main)
# .github/workflows/terraform-apply.ymlname: Terraform Apply on: push: branches: [main] paths: ["environments/**", "modules/**"] permissions: id-token: write contents: read jobs: apply: runs-on: ubuntu-latest environment: production # this line gates the job behind manual approval steps: - uses: actions/checkout@v4 - name: Configure AWS credentials via OIDC uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789012:role/github-actions-terraform-deploy aws-region: ap-south-1 - uses: hashicorp/setup-terraform@v3 with: terraform_version: 1.7.5 - name: Terraform Init run: terraform init working-directory: environments/prod - name: Terraform Apply run: terraform apply -auto-approve working-directory: environments/prodIn GitHub's repository settings, the production environment is configured with required reviewers — meaning even after the PR merges, a designated approver must click "Approve" inside GitHub's Environments tab before this job actually runs apply. This is a second, independent gate from the PR review itself.
Step 4 — Cache Terraform Providers for Faster Runs
- name: Cache Terraform providers uses: actions/cache@v4 with: path: ~/.terraform.d/plugin-cache key: ${{ runner.os }}-terraform-providers-${{ hashFiles('**/.terraform.lock.hcl') }} - name: Configure plugin cache directory run: echo 'plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"' > ~/.terraformrcINFORMATIONTip from a senior engineer: without this cache, every single pipeline run re-downloads the AWS provider plugin — which is over 400MB. On a busy repo with twenty PRs a day, that is gigabytes of redundant network traffic and minutes of wasted CI time, every single run.
Step 5 — Handle Multiple Environments with a Matrix
jobs: plan: strategy: matrix: environment: [dev, staging, prod] runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Terraform Plan for ${{ matrix.environment }} run: terraform plan working-directory: environments/${{ matrix.environment }}This runs the plan job three times in parallel — once per environment — instead of writing three nearly-identical workflow files.
Step 6 — Add Security Scanning Before Plan
- name: Run tfsec uses: aquasecurity/tfsec-action@v1.0.3 with: working_directory: environments/prod # tfsec fails the job on issues like:# - aws-s3-enable-bucket-encryption: Bucket does not have encryption enabled# - aws-ec2-no-public-ingress-sgr: Security group rule allows ingress from 0.0.0.0/0Step 7 — Pin the Terraform Version Consistently
# versions.tf — without this, your laptop's Terraform version and the# CI runner's Terraform version can silently drift apart and produce# slightly different plansterraform { required_version = "1.7.5" # exact pin — not a range — for production root configs}# In the workflow, ALWAYS match the version pinned in versions.tf- uses: hashicorp/setup-terraform@v3 with: terraform_version: 1.7.5 # keep this in sync with required_version aboveProduction Best Practices and Common Pitfalls
Never let the plan job have write access to AWS. The IAM role assumed during
planshould be read-only where possible (or at minimum, scoped tightly) —planruns on every PR from every contributor, including ones whose changes haven't been reviewed yet. Save write/apply permissions for the separateapplyworkflow's role.Use GitHub Environments for the approval gate, not just branch protection. Branch protection controls who can merge code. A GitHub Environment with required reviewers controls who can authorize the actual
apply— these are two different gates, and you want both.Restrict the OIDC trust policy to your exact repo and branch. A loosely scoped
subclaim is the single most common OIDC misconfiguration — always test that another repository genuinely cannot assume your role before trusting it in production.Treat
continue-on-error: trueon the plan step carefully. It is there so the PR comment shows the actual error instead of GitHub Actions just showing a red X with no detail — but you must explicitly fail the job afterward, or a broken plan will silently appear to pass.
Quick Reference and Troubleshooting Commands
| Workflow Trigger | Job | What Runs |
|---|---|---|
pull_request |
Plan | fmt check, validate, tflint, plan, post PR comment |
push to main |
Apply | init, apply — gated by GitHub Environment approval |
| Error | Root Cause | Fix |
|---|---|---|
Not authorized to perform sts:AssumeRoleWithWebIdentity |
OIDC trust policy sub condition doesn't match this repo/branch |
Check the exact repo:org/name:ref:refs/heads/branch string in the trust policy |
| Plan comment never appears on PR | Missing pull-requests: write permission |
Add it to the workflow's permissions block |
| Apply job stuck "Waiting for approval" | GitHub Environment has required reviewers configured | A designated reviewer must approve in the repo's Environments tab |
| Provider download is slow on every run | Plugin cache not configured or cache key always misses | Verify the cache key includes a hash of .terraform.lock.hcl |