Overview and What You Will Learn
This guide explains the four DORA metrics -- Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Restore -- and how to actually compute each one from data your pipeline already produces. You will learn the Elite/High/Medium/Low performance bands from Google's State of DevOps research, where to pull the raw timestamps from in GitHub Actions or GitLab CI, and which pipeline changes move each metric in practice.
Why This Matters in Production
DORA metrics exist because "is our engineering org good at shipping software" used to be answered with opinions instead of numbers. The research behind these four metrics, run annually as the State of DevOps report, found that the same four measurements consistently separate elite-performing engineering organisations from low performers, regardless of company size or industry. A team at Swiggy or Zerodha shipping multiple times a day with sub-5% change failure rate is not lucky -- it is the output of small batch sizes, fast automated tests, and fast rollback paths, all of which are pipeline design decisions, not headcount decisions.
COMMON MISTAKE / WARNING**Common Mistake:** Tracking deployment frequency alone and declaring victory. Deploying ten times a day while change failure rate sits at 30% is not elite performance -- it is shipping incidents faster. The four metrics only mean something together: two speed metrics and two stability metrics, balanced against each other.
Core Principles
The four metrics
- Deployment Frequency -- how often code reaches production.
- Lead Time for Changes -- time from commit to that commit running in production.
- Change Failure Rate -- the percentage of deployments that cause an incident, rollback, or hotfix.
- Mean Time to Restore (MTTR) -- how long it takes to recover service after a failure caused by a deployment.
+----------------------+ +----------------------+| SPEED METRICS | | STABILITY METRICS || | | || Deploys/day: many | | Failure rate: <5% || Lead time: <1hr | | MTTR: <1hr |+----------------------+ +----------------------+Benchmark bands
| Tier | Deployment Frequency |
|---|---|
| Elite | Multiple deploys per day |
| High | Between once per week and once per month |
| Medium | Between once per month and once every six months |
| Low | Fewer than once every six months |
| Tier | Lead Time for Changes |
|---|---|
| Elite | Less than one hour |
| High | Between one day and one week |
| Medium | Between one month and six months |
| Low | More than six months |
| Tier | Change Failure Rate |
|---|---|
| Elite | Zero to fifteen percent |
| High | Sixteen to thirty percent |
| Medium / Low | Higher, with increasing manual remediation |
| Tier | MTTR |
|---|---|
| Elite | Less than one hour |
| High | Less than one day |
| Medium | Between one day and one week |
| Low | More than six months |
PLACEMENT PRO TIP**Tip:** Treat these bands as a direction to move in, not a certification to chase. A team that goes from Low to Medium on Lead Time has made a bigger real improvement than a team already at Elite squeezing out another ten minutes.
How to measure each metric from pipeline data
- Deployment Frequency -- count successful production deploy jobs over
a time window. The GitHub Deployments API or your
kubectl rollout historytimestamps both work as a data source. - Lead Time -- subtract the first commit timestamp in a PR (or the commit timestamp itself) from the timestamp the production deploy job completed successfully.
- Change Failure Rate -- (deployments that triggered an incident or rollback) divided by (total deployments) over the same window. Requires tagging incidents with the deployment that caused them.
- MTTR -- incident close timestamp minus incident open timestamp, averaged across incidents in the window.
COMMON MISTAKE / WARNING**Security:** Be careful which roles can write to your incident tracker's "deployment that caused this" field -- if change failure rate becomes a metric people are evaluated on, there is pressure to under-attribute incidents to deployments. Keep the linkage automated where possible (deploy event ID stamped onto the incident at creation time) rather than a manually-typed field.
Common improvements
- Smaller pull requests -- a PR touching 50 lines is faster to review, faster to test, and lower-risk to deploy than one touching 2,000 lines.
- Faster test suites -- parallelise, cut flaky tests, and separate the fast unit suite (runs on every commit) from a slower nightly suite.
- Automated rollback -- if MTTR depends on a human noticing a dashboard and manually rolling back, you have a ceiling on how fast MTTR can ever be.
Detailed Step-by-Step Practical Lab
This lab builds a small script that computes all four DORA metrics for
Hotstar's streaming-api repository over the last 30 days, pulling from
the GitHub API and a deployment-events table.
Milestone 1 — Pull deployment events from the GitHub Deployments API
curl -s -H "Authorization: Bearer $GITHUB_TOKEN" \ "https://api.github.com/repos/hotstar/streaming-api/deployments?environment=production&per_page=100" \ > deployments.json jq 'length' deployments.jsonAt this point you have a raw count of production deployments for the window -- this is the numerator for Deployment Frequency.
Milestone 2 — Calculate Deployment Frequency
import jsonfrom datetime import datetime, timedelta with open("deployments.json") as f: deploys = json.load(f) window_start = datetime.utcnow() - timedelta(days=30)recent = [d for d in deploys if datetime.strptime(d["created_at"], "%Y-%m-%dT%H:%M:%SZ") > window_start] print(f"Deploys in last 30 days: {len(recent)}")print(f"Per day average: {len(recent) / 30:.2f}")At this point you can classify the team's current Deployment Frequency tier against the benchmark table above.
Milestone 3 — Calculate Lead Time for Changes
import statistics lead_times_hours = []for d in recent: commit_sha = d["sha"] commit = json.load(open(f"commit_{commit_sha}.json")) # pre-fetched commit_time = datetime.strptime(commit["commit"]["committer"]["date"], "%Y-%m-%dT%H:%M:%SZ") deploy_time = datetime.strptime(d["created_at"], "%Y-%m-%dT%H:%M:%SZ") lead_times_hours.append((deploy_time - commit_time).total_seconds() / 3600) print(f"Median lead time: {statistics.median(lead_times_hours):.2f} hours")At this point you have a median lead time figure -- compare it against the one-hour Elite threshold.
Milestone 4 — Tag incidents against deployments for Change Failure Rate
In your incident tracker, add a caused_by_deployment_id field populated
automatically by your deploy pipeline at deploy time (write the deployment
ID into an environment variable available to the running service, surfaced
in incident creation tooling).
SELECT COUNT(DISTINCT i.id) * 100.0 / COUNT(DISTINCT d.id) AS change_failure_rate_pctFROM deployments dLEFT JOIN incidents i ON i.caused_by_deployment_id = d.idWHERE d.created_at > NOW() - INTERVAL '30 days';At this point you have a single percentage figure for Change Failure Rate over the window.
Milestone 5 — Calculate MTTR from incident timestamps
SELECT AVG(EXTRACT(EPOCH FROM (resolved_at - opened_at)) / 60) AS mttr_minutesFROM incidentsWHERE opened_at > NOW() - INTERVAL '30 days' AND caused_by_deployment_id IS NOT NULL;At this point you have all four numbers and can plot them against the benchmark bands to see where the team currently sits.
Milestone 6 — Wire up an alert when a metric regresses
- name: Check DORA thresholds run: | python3 calculate_dora.py --window 7d --fail-if-cfr-above 15At this point a sustained regression in any metric -- not just a single bad deploy -- triggers a visible signal instead of being noticed three months later in a retro.
REMEMBER THIS**Remember:** DORA metrics are trailing indicators measured over a rolling window (commonly 7, 30, or 90 days), not a single deploy's score. One bad deploy should move the trend line slightly, not be treated as a metric failure on its own.
Production Best Practices & Common Pitfalls
- Automate the linkage between deployments and incidents -- manual tagging decays in accuracy within weeks.
- Report all four metrics together on one dashboard; reporting Deployment Frequency alone incentivises shipping fast and breaking things.
- Use a rolling window, not a single sprint, to smooth out noise from one unusually good or bad week.
- Do not use DORA metrics in individual performance reviews -- they are team and system-level indicators, and turning them into individual KPIs reliably produces metric gaming (smaller deploys that don't actually ship meaningful work, incidents quietly reclassified as "not caused by a deployment").
Quick Reference & Troubleshooting Commands
| Symptom | Command | What to Look For |
|---|---|---|
| Deployment Frequency looks artificially high | `jq '. | length' deployments.json` per environment |
| Lead time calculation returns negative numbers | Compare commit and deploy timestamps manually | Time zone mismatch between commit timestamp and deploy timestamp |
| Change Failure Rate stuck at zero | Query the incidents table join | caused_by_deployment_id not being populated at incident creation |
| MTTR skewed very high by one outlier | SELECT MAX(resolved_at - opened_at) FROM incidents |
One incident left open in the tracker long after it was actually fixed |