Overview and What You Will Learn
In production, the tag on a Docker image is a promise. When you deploy payment-api:v3.1.0 to staging and then deploy payment-api:v3.1.0 to production, you are trusting that both environments are running identical code. If someone pushed a new build with the same tag in between — a mistake that happens constantly with latest — your staging and production environments silently diverge.
In this guide you will learn how to design an immutable image tagging strategy, how to authenticate with Docker Hub, AWS ECR, and private registries, and how to manage image lifecycle policies so storage costs do not spiral out of control.
Why This Matters in Production
At Zerodha, a trading system deployed with the wrong image tag caused a 23-minute outage during market hours. The engineer had tagged a test build as trading-engine:latest and the deployment pipeline pulled it to production. Their tagging policy now requires git SHA tags and prohibits latest in production deployments. One policy change, enforced in the pipeline, prevents an entire category of deployment errors.
Core Principles
Tag Types — From Most to Least Stable: +------------------------------------------+| Digest (immutable, forever) || nginx@sha256:a84f9c2b1d3e... || Same bytes every time, guaranteed |+------------------------------------------+| Semantic Version (stable until re-tagged)|| payment-api:v3.1.0 || Pinned by convention, should never change|+------------------------------------------+| Git SHA (immutable by practice) || payment-api:a3f9c2d || Traceable to exact commit, no ambiguity |+------------------------------------------+| Branch + SHA (CI builds) || payment-api:main-a3f9c2d || Useful for tracking which branch |+------------------------------------------+| latest (NEVER use in production) || payment-api:latest || Meaning changes with every push || Untraceable, non-reproducible |+------------------------------------------+Detailed Step-by-Step Practical Lab
Milestone 1: Why latest Is Dangerous
# Scenario: two engineers, same tag, different code # Engineer A pushes a tested build at 9amdocker build -t registry.razorpay.in/payment-api:latest .docker push registry.razorpay.in/payment-api:latest # Deployment pipeline pulls latest and deploys to staging# Staging tests pass # Engineer B pushes a broken build at 10am (same tag)docker build -t registry.razorpay.in/payment-api:latest .docker push registry.razorpay.in/payment-api:latest# This OVERWRITES Engineer A's tested image # Deployment pipeline promotes staging tag to production# But production now gets Engineer B's broken build# Both environments used :latest but got DIFFERENT images # This is why latest is banned in production at serious engineering teams # Check when an image was last pushed (the tag gives you no information)docker inspect registry.razorpay.in/payment-api:latest \ --format '{{.Created}}'# 2024-01-15T10:00:00Z — who pushed this? what commit? no way to knowMilestone 2: The Right Tagging Strategy
# Strategy: tag with git SHA (always) + semantic version (on releases) # In CI pipeline, get the current git SHAGIT_SHA=$(git rev-parse --short HEAD)# a3f9c2d # Build with multiple tags simultaneouslydocker build \ -t registry.razorpay.in/payment-api:${GIT_SHA} \ -t registry.razorpay.in/payment-api:v3.1.0 \ -t registry.razorpay.in/payment-api:v3.1 \ -t registry.razorpay.in/payment-api:v3 \ . docker push registry.razorpay.in/payment-api:${GIT_SHA}docker push registry.razorpay.in/payment-api:v3.1.0docker push registry.razorpay.in/payment-api:v3.1docker push registry.razorpay.in/payment-api:v3 # Result:# :a3f9c2d -> exact commit, traceable to GitHub# :v3.1.0 -> exact release, pinned# :v3.1 -> latest patch for v3.1 (useful for minor updates)# :v3 -> latest minor for v3 (useful for major version tracking) # Deployment always uses the most specific tag:# Dev: payment-api:main-a3f9c2d (latest build on main)# Staging: payment-api:v3.1.0-rc1 (release candidate)# Prod: payment-api:v3.1.0 (pinned release) # In your deployment YAML (Kubernetes or Compose):# image: registry.razorpay.in/payment-api:v3.1.0# NOT: image: registry.razorpay.in/payment-api:latestMilestone 3: Docker Hub Authentication and Limits
# Log in to Docker Hubdocker login# Username: your-dockerhub-username# Password: (use a personal access token, not your password)# Token created at: hub.docker.com -> Account Settings -> Security -> Access Tokens # Docker Hub rate limits for unauthenticated pulls:# Anonymous: 100 pulls per 6 hours per IP# Free account: 200 pulls per 6 hours# Pro/Team: unlimited # On CI servers with many engineers, anonymous pulls get rate limited quickly# Always authenticate in CI:echo "$DOCKERHUB_TOKEN" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin# Use secrets for the token, never hardcode # Configure Docker Hub rate limit checkingdocker pull ratelimitpreview/testdocker pull --quiet ratelimitpreview/test# Watch output for rate limit headers # Use your own registry mirror for Docker Hub to avoid rate limits# In /etc/docker/daemon.json:{ "registry-mirrors": ["https://your-mirror.example.com"]}Milestone 4: AWS ECR — The Production Registry
AWS Elastic Container Registry is what most teams running on AWS use in production. It integrates natively with IAM and EKS.
# Step 1: Authenticate to ECR# ECR tokens expire every 12 hours — re-authenticate before pushing AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)AWS_REGION=ap-south-1 aws ecr get-login-password --region ${AWS_REGION} | \ docker login --username AWS --password-stdin \ ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com # Step 2: Create a repository (once)aws ecr create-repository \ --repository-name payment-api \ --region ${AWS_REGION} \ --image-scanning-configuration scanOnPush=true \ --encryption-configuration encryptionType=AES256 # Step 3: Tag and pushECR_URI=${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com docker tag payment-api:v3.1.0 ${ECR_URI}/payment-api:v3.1.0docker tag payment-api:v3.1.0 ${ECR_URI}/payment-api:${GIT_SHA} docker push ${ECR_URI}/payment-api:v3.1.0docker push ${ECR_URI}/payment-api:${GIT_SHA} # Step 4: Pull from ECR on your serverdocker pull ${ECR_URI}/payment-api:v3.1.0ECR lifecycle policies — critical for cost control:
{ "rules": [ { "rulePriority": 1, "description": "Keep last 10 tagged releases", "selection": { "tagStatus": "tagged", "tagPrefixList": ["v"], "countType": "imageCountMoreThan", "countNumber": 10 }, "action": { "type": "expire" } }, { "rulePriority": 2, "description": "Delete untagged images older than 1 day", "selection": { "tagStatus": "untagged", "countType": "sinceImagePushed", "countUnit": "days", "countNumber": 1 }, "action": { "type": "expire" } }, { "rulePriority": 3, "description": "Keep last 30 git SHA builds", "selection": { "tagStatus": "any", "countType": "imageCountMoreThan", "countNumber": 30 }, "action": { "type": "expire" } } ]}# Apply lifecycle policyaws ecr put-lifecycle-policy \ --repository-name payment-api \ --lifecycle-policy-text file://lifecycle-policy.json \ --region ${AWS_REGION}Milestone 5: Private Registry with Harbor
Harbor is a self-hosted registry with built-in vulnerability scanning, RBAC, and image replication. Teams that cannot use cloud registries (compliance reasons, air-gapped environments) use Harbor.
# Pull from a Harbor registrydocker login harbor.zerodha.in# Username: your-harbor-username# Password: your-harbor-password docker pull harbor.zerodha.in/trading/trading-engine:v4.1.0 # Push to Harbordocker tag trading-engine:v4.1.0 harbor.zerodha.in/trading/trading-engine:v4.1.0docker push harbor.zerodha.in/trading/trading-engine:v4.1.0 # Harbor project/image structure:# harbor.zerodha.in / [project] / [repository] : [tag]# harbor.zerodha.in/trading/engine:v4.1.0# harbor.zerodha.in/payments/api:v3.1.0# harbor.zerodha.in/infra/nginx:1.25-hardenedMilestone 6: Image Retention and Cleanup
Without a cleanup policy, registries fill up with thousands of images. At ECR pricing (~$0.10/GB/month), 500GB of old images costs $600/month.
# List all images in an ECR repositoryaws ecr list-images \ --repository-name payment-api \ --region ap-south-1 \ --query 'imageIds[*]' # Find untagged images (dangling from failed builds)aws ecr list-images \ --repository-name payment-api \ --filter tagStatus=UNTAGGED \ --query 'imageIds[*].imageDigest' \ --output text # Delete specific imagesaws ecr batch-delete-image \ --repository-name payment-api \ --image-ids imageTag=v2.0.0 imageTag=v2.0.1 imageTag=v2.1.0 \ --region ap-south-1 # Local cleanup — remove images not used by any containerdocker image prune -a# WARNING: This removes ALL images not referenced by any container# Safe on CI agents, dangerous on a machine running containers # Remove images older than 24 hours (safe for build machines)docker image prune -a --filter "until=24h" -fCommon Mistakes
| Mistake | Consequence | Fix |
|---|---|---|
Using :latest in deployment configs |
Silent environment divergence | Always pin to specific tag: image: payment-api:v3.1.0 |
| Storing ECR password in plaintext | Credential leak | Use aws ecr get-login-password with IAM roles — no long-lived passwords |
| No lifecycle policy on ECR | Registry storage bill grows to hundreds of dollars | Add lifecycle policy on every repository at creation time |
| Tagging with only semantic version, no SHA | Cannot trace image back to exact commit | Always add git SHA tag alongside semantic version |
| Sharing a single Docker Hub account across all CI agents | Rate limited in minutes | Create a dedicated CI service account with a paid plan or use ECR |
Troubleshooting Reference
| Problem | Cause | Fix |
|---|---|---|
no basic auth credentials |
Not logged in to registry | docker login registry-url |
toomanyrequests: too many requests |
Docker Hub rate limit hit | Authenticate or switch to ECR |
denied: requested access to the resource is denied |
Wrong IAM permissions for ECR | Check IAM policy includes ecr:GetDownloadUrlForLayer, ecr:BatchGetImage |
| ECR token expired | Token valid only 12 hours | Re-run aws ecr get-login-password and docker login |
Push rejected: tag invalid |
Tag contains invalid characters | Tags can only contain alphanumeric, -, _, . — no slashes or colons |
PLACEMENT PRO TIP**Tip:** In GitHub Actions, use the `docker/metadata-action` to automatically generate correct tags from your git context — it handles semantic versioning, SHA tags, branch tags, and latest tag suppression in one step. It is the industry standard for tagging in CI.
REMEMBER THIS**Remember:** A Docker tag is just a pointer to a manifest digest. Two different tags can point to the exact same image. `payment-api:v3.1.0` and `payment-api:a3f9c2d` will both point to the same image if you built and tagged from the same commit. This is correct and expected — the same image, just referenced two ways.
COMMON MISTAKE / WARNING**Common Mistake:** Using the ECR repository URI as the registry login endpoint. The login endpoint is just the account and region: `AWS_ACCOUNT.dkr.ecr.REGION.amazonaws.com`. If you include the repository name (`/payment-api`) in the login URL, authentication fails silently.
COMMON MISTAKE / WARNING**Security:** Never use long-lived IAM user credentials (`AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`) for ECR authentication in CI. Use OIDC (GitHub Actions to AWS IAM role trust policy) so your CI never holds static credentials. If a static credential is compromised, an attacker has indefinite ECR access. An OIDC token expires in minutes and is scoped to specific repositories.