Overview and What You Will Learn
Atlantis is a bot that lives in your infrastructure repository and runs Terraform for you whenever someone opens a pull request. It posts the terraform plan output directly as a comment on the PR, and only runs terraform apply after someone types the words atlantis apply in a follow-up comment. The entire approval loop — see the plan, approve the code, trigger the apply — happens inside GitHub, with nobody ever running Terraform from a personal laptop.
This lab goes further than just knowing what Atlantis is — it walks through standing up a real Atlantis server, wiring it to GitHub, configuring per-project workflows, and understanding exactly where it fits next to (or instead of) GitHub Actions.
By the end of this lab you will be able to:
- Install and run an Atlantis server on Kubernetes using the official Helm chart
- Configure GitHub webhooks so Atlantis receives pull request events
- Write an
atlantis.yamlthat defines multiple projects and per-project workflows - Use
atlantis planandatlantis applyas PR comments to drive real infrastructure changes - Set up required approvals and custom workflow steps, including a policy-check stage
- Decide confidently between Atlantis and GitHub Actions for your own team
Why This Matters in Production
At Razorpay, a 12-person platform team adopted Atlantis after outgrowing their GitHub Actions-only setup. The team's complaint with Actions wasn't that it didn't work — it was that engineers had to leave the PR, go check the Actions tab in a separate browser tab, scroll through raw log output, and then come back to comment their approval. With Atlantis, the entire interaction — plan output, approval, and the apply trigger — lives inside the PR itself as comments, which meant engineers actually started reading every plan line by line instead of skimming a green checkmark and approving on faith.
Core Principles
Atlantis Server Architecture
+------------------------------------------------+| GitHub Repository || (your Terraform infrastructure) |+------------------------------------------------+ | Webhook fires on PR open/comment | v+------------------------------------------------+| ATLANTIS SERVER || (runs continuously, e.g. on a K8s pod) || || - Receives webhook || - Clones the PR's branch || - Runs terraform plan/apply inside its own || working directory per project || - Posts results back to GitHub as PR comments || - Holds a lock per project directory while a || plan/apply is in progress |+------------------------------------------------+ | Plan output, lock status, apply result | v+------------------------------------------------+| Pull Request comment thread || (where engineers actually interact) |+------------------------------------------------+Detailed Step-by-Step Practical Lab
Step 1 — Install Atlantis on Kubernetes
helm repo add runatlantis https://runatlantis.github.io/helm-chartshelm repo update # Create a values file with your GitHub app or token configurationcat > atlantis-values.yaml << 'EOF'github: user: razorpay-infra-bot token: <stored-as-a-kubernetes-secret-not-here> secret: <webhook-secret-also-stored-as-a-secret> orgAllowlist: "github.com/razorpay-platform/*" # restrict which orgs/repos can use this server resources: requests: cpu: 250m memory: 256MiEOF helm install atlantis runatlantis/atlantis -f atlantis-values.yaml --namespace atlantis --create-namespaceINFORMATIONSecurity note: `orgAllowlist` is not optional in practice. Without it, your Atlantis server will happily process webhook requests claiming to be from ANY GitHub org that knows your webhook URL. Always scope it to your exact organisation.
Step 2 — Configure the GitHub Webhook
In your repository's Settings → Webhooks, point GitHub at the Atlantis server's public endpoint:
Payload URL: https://atlantis.razorpay-internal.com/eventsContent type: application/jsonSecret: <the same webhook secret configured in atlantis-values.yaml>Events: Pull requests, Pull request reviews, Issue comments, PushesStep 3 — Write atlantis.yaml at the Repo Root
# atlantis.yaml — defines every "project" Atlantis should manage in this repoversion: 3projects: - name: prod-vpc dir: environments/prod/vpc workflow: default apply_requirements: [approved, mergeable] # both must be true before apply runs - name: prod-ecs-orders dir: environments/prod/ecs-service-orders workflow: default apply_requirements: [approved, mergeable] - name: staging-vpc dir: environments/staging/vpc workflow: staging # staging gets a lighter-weight workflow, defined belowStep 4 — Drive Plan and Apply Through PR Comments
# Engineer opens a PR. Atlantis automatically detects which "projects"# the changed files belong to and runs plan on each one, posting: Ran Plan for dir: environments/prod/vpc Plan: 2 to add, 1 to change, 0 to destroy [full plan output in a collapsible comment section] # A reviewer can re-trigger plan for a SPECIFIC project after pushing a fix:atlantis plan -p prod-vpc # Once approved, a reviewer triggers the actual infrastructure change:atlantis apply -p prod-vpc # Or apply EVERY planned project in the PR at once:atlantis applyStep 5 — Locking Prevents Concurrent Applies
# If PR #142 has already run `plan` on environments/prod/vpc, and# someone opens PR #145 also touching environments/prod/vpc, Atlantis# will respond on PR #145 with: This project is currently locked by an unapplied plan from pull request #142. The plan must be applied or discarded before future plans can execute. # This is automatic — no engineer has to remember to coordinate manually.Step 6 — Custom Workflows with a Policy Check Stage
workflows: default: plan: steps: - run: tflint --recursive # lint BEFORE plan — fail fast on bad syntax - init - plan - run: conftest test $PLANFILE --policy /policies # OPA policy check on the plan output apply: steps: - apply staging: plan: steps: - init - plan # staging skips the policy-check step — lower stakes, faster iteration apply: steps: - applyINFORMATIONTip from a senior engineer: putting `conftest` (an OPA policy tool) directly in the plan workflow means a policy violation — like "no security group may allow ingress from 0.0.0.0/0" — gets caught and posted as a PR comment automatically, before any human even reviews the code. The reviewer's job becomes reviewing intent, not manually re-checking rules a machine already verified.
Step 7 — Atlantis vs GitHub Actions for Terraform
| Atlantis | GitHub Actions | |
|---|---|---|
| Where the interaction happens | Entirely inside PR comments | PR comment for plan, separate Actions tab for full logs |
| Hosting | You run and maintain the server | Fully managed by GitHub |
| Locking | Built-in, automatic, per project directory | You must build this yourself |
| Setup effort | Higher — server, webhook, Helm chart | Lower — just YAML workflow files |
| Best fit | Teams wanting a tight, comment-driven approval loop | Teams already standardised on Actions for everything else |
Step 8 — Atlantis vs Terraform Cloud
| Atlantis | Terraform Cloud | |
|---|---|---|
| Hosting | Self-hosted, open-source, free | Managed SaaS (or self-hosted Enterprise) |
| State storage | You still configure your own backend (e.g., S3) | Built-in, managed remote state |
| Cost | Free, but you pay in operational overhead | Free tier limited; paid tiers scale with usage |
| Policy as code | Via custom workflow steps (OPA/conftest) | Built-in Sentinel policy engine |
Production Best Practices and Common Pitfalls
Always set
apply_requirements: [approved, mergeable]. Without it, anyone who can comment on the PR — not necessarily an approved reviewer — can triggeratlantis apply. This single line is the difference between "reviewed changes only" and "anyone with comment access can change production."Run Atlantis itself behind your own IAM role boundaries. The Atlantis server's own AWS credentials determine what it can actually apply. If the server's role has admin access to your entire AWS account, every project in every repo it watches inherits that blast radius.
Don't skip the
orgAllowlistsetting. This is the single most common Atlantis misconfiguration in the wild — an Atlantis server with an open webhook and no allowlist will process and execute Terraform from any repository that can reach its webhook URL.Keep staging and production on different workflows. Production projects should require policy checks and explicit approval; staging projects can move faster with a lighter workflow — don't force the same friction on both.
Quick Reference and Troubleshooting Commands
| Command (as PR comment) | What It Does |
|---|---|
atlantis plan |
Plans every changed project detected in the PR |
atlantis plan -p <project> |
Plans one specific named project |
atlantis apply |
Applies every planned, approved project |
atlantis apply -p <project> |
Applies one specific project |
atlantis unlock |
Manually releases a stuck project lock (admin only) |
| Error | Root Cause | Fix |
|---|---|---|
| Atlantis never responds to a new PR | Webhook not delivered, or signature mismatch | Check GitHub's webhook delivery log for the exact error response |
This project is currently locked |
Another open PR holds the plan lock | Wait for that PR to merge or close, or run atlantis unlock if abandoned |
apply_requirements not met |
Missing approval or PR has merge conflicts | Get an approval and resolve conflicts before commenting apply |
| Policy check step fails unexpectedly | OPA policy rule too strict, or plan JSON format changed | Review the conftest output line by line — it names the exact failed rule |