Overview and What You Will Learn
This guide covers GitOps as a deployment model and ArgoCD as the tool that
implements it for Kubernetes. You will learn the four GitOps principles,
ArgoCD's internal architecture, how to structure a Git repository so that
ArgoCD can manage many applications cleanly, and how to configure the
Application custom resource with automated sync and self-healing so that
the cluster state always matches what is committed to Git -- with no
kubectl apply run by a human against a production cluster ever again.
Why This Matters in Production
Push-based deployment -- a CI job running kubectl apply directly against
a production cluster -- means the CI system needs a network path and
credentials into every cluster it deploys to, and the only record of what
is actually running is whatever the last pipeline run happened to apply.
GitOps inverts this: an agent running inside the cluster (ArgoCD) pulls the
desired state from Git and reconciles toward it continuously. CRED and
Meesho-style platform teams adopt this for two practical reasons --
auditability (every change to production is a Git commit with an author and
a PR review) and self-healing (if someone runs a manual kubectl edit in a
moment of panic, ArgoCD reverts it back to what Git says within minutes).
COMMON MISTAKE / WARNING**Common Mistake:** Treating ArgoCD as "Jenkins but for Kubernetes" and having CI push directly to the cluster anyway, with ArgoCD just watching passively. The entire value of GitOps depends on Git being the *only* path to changing cluster state -- if there is still a side door, you have not actually adopted GitOps, you have just added a dashboard.
Core Principles
The four GitOps principles
- Declarative -- the entire system state is described declaratively (Kubernetes YAML, Helm values, Kustomize overlays), never as a sequence of imperative commands.
- Versioned and immutable -- the desired state lives in Git, which
gives you history, diffs, and the ability to revert with
git revert. - Pulled automatically -- software agents inside the cluster pull the desired state, rather than external systems pushing into the cluster.
- Continuously reconciled -- the agent constantly compares live state to desired state and corrects drift, not just at deploy time.
+------------------------------------------+| Engineer merges YAML change to repo | <- 1. commit+------------------------------------------+ | v+------------------------------------------+| ArgoCD repo server detects new commit | <- 2. detect+------------------------------------------+ | v+------------------------------------------+| App controller diffs desired vs live | <- 3. diff+------------------------------------------+ | v+------------------------------------------+| Controller applies changes to cluster | <- 4. sync+------------------------------------------+ | v+------------------------------------------+| Cluster state now matches Git exactly | <- 5. healthy+------------------------------------------+ArgoCD architecture
+----------------------+ +----------------------+| ARGOCD REPO SERVER | | APP CONTROLLER || | | || Repo server: clones | | App controller: diffs || and renders manifests | | live state vs desired |+----------------------+ +----------------------+- API server -- the gRPC/REST entrypoint used by the CLI, UI, and webhook callbacks.
- Repo server -- clones the Git config repo and renders the final Kubernetes manifests (running Helm template or Kustomize build).
- Application controller -- the core reconciliation loop, continuously diffing live cluster state against the rendered desired state.
- Redis -- caches rendered manifests and live state to keep the reconciliation loop fast across hundreds of Applications.
The Application CRD
apiVersion: argoproj.io/v1alpha1kind: Applicationmetadata: name: payments-api-production namespace: argocdspec: project: payments source: repoURL: https://github.com/razorpay/payments-api-config.git targetRevision: main path: overlays/production destination: server: https://kubernetes.default.svc namespace: payments-prod syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=trueprune: true removes resources from the cluster that were deleted from
Git. selfHeal: true is what makes ArgoCD revert manual kubectl changes
back to the Git-declared state automatically.
COMMON MISTAKE / WARNING**Security:** Turning on `selfHeal` means a manual emergency `kubectl scale` during an incident will be reverted by ArgoCD within minutes unless you also pause the Application or scale via a Git commit. Know this before your first incident, not during it.
App of Apps pattern
A single root Application points at a directory containing many child Application manifests, so one Git commit can add or remove entire services from the platform.
config-repo/ apps/ payments-api.yaml <- child Application notification-svc.yaml <- child Application root-app.yaml <- points ArgoCD at apps/Repository structure: app repo vs config repo
Keep application source code (the app repo) separate from the rendered or templated Kubernetes manifests (the config repo). CI builds and pushes an image from the app repo, then opens a PR against the config repo bumping the image tag -- ArgoCD only ever watches the config repo.
PLACEMENT PRO TIP**Tip:** Have CI open the image-bump PR against the config repo automatically, but require a human review before merge for production overlays. This keeps the audit trail (who approved this exact deploy) while still removing the manual `kubectl apply` step entirely.
Helm and Kustomize via ArgoCD
ArgoCD's repo server can render either format natively -- point source
at a Helm chart with a values.yaml per environment, or at a Kustomize
base with environment-specific overlays, and ArgoCD handles the rendering
before diffing against the live cluster.
Detailed Step-by-Step Practical Lab
This lab deploys Zerodha's order-gateway service to mumbai-prod-cluster
using an app-of-apps ArgoCD setup with self-healing enabled.
Milestone 1 — Install ArgoCD into the cluster
kubectl create namespace argocdkubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yamlkubectl -n argocd rollout status deploy/argocd-serverAt this point ArgoCD's components are running but no Applications have been registered yet.
Milestone 2 — Structure the config repo
order-gateway-config/ base/ deployment.yaml service.yaml overlays/ staging/kustomization.yaml production/kustomization.yamlAt this point the repo holds declarative manifests but ArgoCD has not seen it yet -- this is the "versioned, declarative" half of GitOps.
Milestone 3 — Create the Application CRD
cat <<EOF | kubectl apply -f -apiVersion: argoproj.io/v1alpha1kind: Applicationmetadata: name: order-gateway-production namespace: argocdspec: project: default source: repoURL: https://github.com/zerodha/order-gateway-config.git targetRevision: main path: overlays/production destination: server: https://kubernetes.default.svc namespace: order-gateway-prod syncPolicy: automated: prune: true selfHeal: trueEOFAt this point argocd app get order-gateway-production should show
Synced and Healthy -- the cluster now matches Git.
Milestone 4 — Confirm automated sync and self-heal are active
argocd app get order-gateway-production -o json | jq '.spec.syncPolicy'At this point you should see "automated": {"prune": true, "selfHeal": true} confirmed in the live Application object, not just the YAML you
applied.
Milestone 5 — Test self-healing
kubectl -n order-gateway-prod scale deploy/order-gateway --replicas=10sleep 60kubectl -n order-gateway-prod get deploy/order-gatewayAt this point the replica count should have reverted back to whatever Git declares -- proof that manual drift gets corrected automatically rather than silently persisting.
Milestone 6 — Add a second service via App of Apps
cat <<EOF >> apps/notification-svc.yamlapiVersion: argoproj.io/v1alpha1kind: Applicationmetadata: name: notification-svc-production namespace: argocdspec: project: default source: repoURL: https://github.com/zerodha/notification-svc-config.git targetRevision: main path: overlays/production destination: server: https://kubernetes.default.svc namespace: notification-svc-prod syncPolicy: automated: prune: true selfHeal: trueEOFgit add apps/notification-svc.yaml && git commit -m "Add notification-svc to platform" && git pushAt this point the root Application picks up the new child Application from
the apps/ directory on its next sync, and notification-svc deploys
without anyone running kubectl apply by hand.
REMEMBER THIS**Remember:** `targetRevision` can point at a branch, a tag, or a specific commit SHA. Pin production overlays to a tag or SHA in higher-compliance environments so a force-push to `main` cannot silently change what is deployed.
Production Best Practices & Common Pitfalls
- Never grant the ArgoCD service account broader cluster permissions than the namespaces it actually manages -- use ArgoCD Projects to scope this.
- Separate app repo from config repo so CI's image-build pipeline never has direct write access to manifests ArgoCD watches.
- Use Kustomize overlays or Helm values per environment instead of duplicating full manifest files -- one base, small per-environment diffs.
- Pause
selfHeal(or useargocd app set --sync-policy nonetemporarily) during a declared incident if you need to make a manual emergency change, then revert it in Git once the incident is resolved. - Watch for "OutOfSync" Applications lingering for long periods -- this usually means the sync window or a sync hook is silently failing.
Quick Reference & Troubleshooting Commands
| Symptom | Command | What to Look For |
|---|---|---|
Application stuck OutOfSync |
argocd app diff order-gateway-production |
A field being changed outside Git by another controller (HPA, admission webhook) |
| Manual fix keeps getting reverted | argocd app get <name> -o json | jq .spec.syncPolicy |
selfHeal: true is active and working as designed |
| New child app in app-of-apps not appearing | argocd app get root-app |
Root Application has not synced since the new file was pushed |
| Sync fails with rendering error | argocd app manifests <name> |
Invalid Kustomize overlay or missing Helm values file in the target path |