Companies requires to react faster to changing customer needs but on the other hand, deliver stable services to their customers. To meet these requirements, DevOps teams and lean practitioners constantly need to improve themselves. As famous management consultant Peter Drucker once said, “if you can’t measure it, you can’t improve it.”
That’s where DORA metrics come into place. Some of you probably also know that DORA and the individuals behind it have been providing a lot of the science and analysis behind the State of DevOps survey and report for several years now. But beyond the 3 rock star founders, what is DORA really about? What is the business model to generate revenue?
I had the pleasure to research just that and let me tell you what I found out!
What Is DORA?
DORA (DevOps Research and Assessment) is a research team founded in 2015 by Nicole Forsgren, Jez Humble, and Gene Kim. The group aims to find ways to improve software development. Over seven years, they surveyed thousands of software professionals across hundreds of organizations in various industries.
The team’s findings were first published in Accelerate: The Science of Lean Software and DevOps (2018). The book introduced four benchmarks that correlate to high-performing organizations: the DORA metrics.
In the same year that the aforementioned book was published, Google acquired the group and established the DORA research program, which is responsible for publishing the yearly State of DevOps Report.
Sweet, But How Does It Work?
Running a DORA Assessment follows four stages:
Decompose an Organization
Identify the Lines of Business and teams to assess
Prepare for Assessment
Socialize the survey and prepare email distribution lists
Take the Assessment
Everyone takes a 15- to 20-min survey
Generate and Deliver Reports
Team capabilities and outcomes benchmarked against the organization
Team capabilities and outcomes benchmarked against the industry
Prioritization analysis, with capabilities arranged by impact to Technology Performance
What Are DORA Metrics?
At the most fundamental level, DORA metrics help companies understand the actions required to quickly and reliably develop and deliver technological solutions.
DevOps teams uses the metrics to measure their performance and find out whether they are “low performers” or “elite performers”. The groundbreaking insight obtained by DORA’s research was, that- given a long-enough term, there is no tradeoff between speed and quality. In other words, reducing quality does not yield a quicker development cycle in the long run. Both speed and stability are essential. Focusing on adding features at the expense of quality results in substandard code, unstable releases and technical debt, eventually stifling progress.
The Four Key DORA Metrics
Let’s dig a little further into the four metrics that the DORA team has identified as being essential to an organization’s DevOps success. The four metrics used are deployment frequency (DF), lead time for changes (LT), mean time to recovery (MTTR), and change failure rate (CFR).
Deployment frequency refers to the cadence of an organization’s successful releases to production. Teams define success differently, so deployment frequency can measure a range of things, such as how often is the code deployed to production or how often is it released to end users. Regardless of what this metric measures on a team-by-team basis, elite performers aim for continuous deployment, with multiple deployments per day.
Within the DevOps world, you want to strive for “continuous” development and this in itself implies a high deployment frequency. Being able to deploy on demand allows you to receive consistent feedback and deliver value to end users more quickly.
It measures how often an organization successfully releases code to production. This way, your whole team gets a clear understanding of the relationship between pull request size and the frequency of deployments, allowing you to optimize your processes and workflows.
How can you improve deployment frequency?
If your organization is falling under the low-performer category, you must leverage an automated deployment pipeline that automates new code testing and feedback mechanism. This reduces the time to recovery and time to delivery. Automating also reduces soak time and approval time significantly, and reduces time to market.
Lead Time For Changes
Lead time for changes reflects the amount of time it takes for a commit to get into production.
It allows you to keep track of the four stages of your team’s PR cycle time: in progress, review, merge, and release. You can also set up team-wide working agreements to make sure that no PRs stay open longer than X days or that reviews happen within X hours.
“Shorter product delivery lead times are better since they enable faster feedback on what we are building and allow us to course-correct more rapidly. Short lead times are also important when there is a defect or outage, and we need to deliver a fix rapidly and with high confidence.”
Accelerate: The science of lean software and DevOps: Building and scaling high performing technology organizations
How can you improve lead time to changes?
To improve lead time to changes, DevOps teams must include automated testing in the development process. Your testing team can educate your Dev teams to write and automate tests. The lead time can also be reduced by introducing more regression unit tests so that any regressions introduced by code changes can be identified as early as possible. Besides that, it ca also be advised to look if the tooling should be improved. The more fixes to e.g. code style that can be done automatically the shorter the PRs take.
Mean Time To Recovery
This metric is essential for making sure you can recover quickly from any incidents. Elite performers can recover services in less than one hour, while low performers require one week to one month. Elite teams achieve a good MTTR score by deploying in small batches to reduce risk and having good monitoring tools to preempt failure. I will tell you all about what is considered an Elite performer in a second.
The mean time to restore metric refers to the time taken by the business to recover from a failure in production. In other words, it is the time required to recover and address all the issues introduced by a release. To calculate the mean time to restore, you need to know the time when the incident occurred and when it was addressed.
How can you improve the mean time to restore?
To improve the time to restore, businesses have to implement robust monitoring processes and swift recovery practices. This enables teams to deploy a go-to action plan for an immediate response to a failure. Businesses can also start investing in auto-recovery mechanisms and prediction techniques to identify failures that may happen, so they can proactively identify or anticipate issues and resolve them well before they occur.
Change Failure Rate
Change failure rate measures the percentage of deployments that cause a failure in production. Rollbacks, failed deployments, and incidents with quick fixes—regardless of the root cause—all count toward the change failure rate. Like the mean time to recover, this metric helps measure stability. How much developer time is diverted into tasks that don’t contribute to business value? Understanding the change failure rate helps leaders decide where to invest in infrastructure to support development teams.
The change failure rate is often calculated from two things – the number of attempted deployments and the number of failed deployments. When tracked over time, this metric provides the details on the amount of time the team is spending on resolving issues and on delivering new code.
How can you improve the change failure rate?
DevOps teams must focus on change failure rate instead of the number of failures. This axes the false convention that the failures reduce with the number of releases. So, teams must push releases more often, and in small batches, to fix the defects easily and quickly. It is also ideal to ensure that all CI/CD processes have been followed religiously, which includes resolving critical vulnerabilities and bugs in code, proper regression techniques in place, and automated performance testing as well.
The DORA metrics are measured using the following 4 levels: Elite, High, Medium, and Low. In each company the definition might vary of what counts as high or low DORA levels. Here I have made a table for you as an example how it could look:
DORA metrics are only valid in tracking a team’s progress in their DevOps journey. They cannot be used to compare teams or individuals.
Goals and metrics should not be confused. Since all metrics can be gamed, equating metrics to objectives leads to perverse incentives. If, for instance, management decides to optimize deployment frequency no matter the cost, it will be in the team’s best interest to deploy any tiny change they can think of, regardless if it adds value to the user or not.
Organizational culture has an enormous impact on team performance. We’ll come back to this later in the post.
Why You Should Use DORA Metrics
So why should every DevOps team use DORA metrics? The answer is pretty simple: If no data measures performance, it is difficult or nearly impossible to make any improvements.
DORA metrics break down abstract processes in software development and delivery and make them more tangible and visible, so engineering leaders can take specific steps towards more streamlined processes and increase the value of software. You can also read more about it in one of our older articles here.
Every business, irrespective of its DevOps maturity, needs the DORA metrics as they are a great way to enhance the efficiency and effectiveness of their DevOps processes. While deployment frequency and lead time for changes help teams to measure velocity (software delivery throughput) and agility, the change failure rate and mean time to restore help measure stability (quality). These metrics enable teams to find out how successful they are at DevOps and identify themselves as elite, high, medium, and low- performing teams.
In short: DORA metrics can help organizations:
Measure software delivery throughout and stability to understand how teams can improve.
Make data-based decisions rather than relying on gut instinct.
Create trust within an organization, which decreases friction and allows for quicker, higher- quality delivery.
The results of using DORA metrics have been pretty impressive and can be also for you if used right. When utilizing the insights gleaned from your data it can improve your business performance. And that’s why we are here, right? Is it all roses and fireworks with this approach? No, but more on that next time!