One classic tactic in conducting root cause analysis is to ask “why” something occurred. This is a useful approach! But “why?” is not the only question you should be concerned with. And focusing too much on “why”—especially if it passes into motivation, as opposed to causation—can start getting into a question of ‘who’s to blame’ as opposed to ‘how do we fix this.’
John Allspaw, who runs Etsy’s operations engineering group, talks about doing ‘blameless postmortems’ as a way to improve, scale, and learn. Etsy is in the top 50 most-visited websites in the US, top 150 globally, so you may want to listen to his advice.
Having a “blameless” Post-Mortem process means that engineers whose actions have contributed to an accident can give a detailed account of:
- what actions they took at what time,
- what effects they observed,
- expectations they had,
- assumptions they had made,
- and their understanding of timeline of events as they occurred.
…and that they can give this detailed account without fear of punishment or retribution.
I was struck, as John described the process, by the frequency of “how” compared to “why” (emphasis added):
- The goal is to understand how an accident could have happened, in order to better equip ourselves from it happening in the future
- We enable and encourage people who do make mistakes to be the experts on educating the rest of the organization how not to make [mistakes] in the future.
- We strive to make sure that the blunt end of the organization understands how work is actually getting done.
- In order to understand how failures happen, we first have to understand our reactions to failure.
- One option is to assume the single cause is incompetence and scream at engineers to make them “pay attention!” or “be more careful!”
Another option is to take a hard look at how the accident actually happened, treat the engineers involved with respect, and learn from the event.
While “why” is an excellent question to help understand causation, chains of events, and motivation, asking “how” can help uncover systemic issues.
Better still, asking “how” can also help you understand what needs to be done to defang those issues. Knowing how a situation can turn out badly can give you the answer to what needs to be changed so that similar situations will have better outcomes in the future. I’ve found that once we understand how a mishap came to occur, it is often immediately clear to everyone what needs to be guarded against. It’s even fairly frequent to have a good idea at how such a safeguard can be implemented.
So, next time you are involved in a root cause analysis, by all means, keep on asking those five whys.
But don’t forget to ask “how?” a few times, too!