• About

Massively Useful

Massively Useful

Author Archives: fullerbecker

Root Cause Analysis: Don’t just ask “Why?” — also ask “How?”

06 Thursday Mar 2014

Posted by fullerbecker in Uncategorized

≈ Leave a comment

Tags

Root Cause Analysis

One classic tactic in conducting root cause analysis is to ask “why” something occurred. This is a useful approach! But “why?” is not the only question you should be concerned with. And focusing too much on “why”—especially if it passes into motivation, as opposed to causation—can start getting into a question of ‘who’s to blame’ as opposed to ‘how do we fix this.’

John Allspaw, who runs Etsy’s operations engineering group, talks about doing ‘blameless postmortems’ as a way to improve, scale, and learn. Etsy is in the top 50 most-visited websites in the US, top 150 globally, so you may want to listen to his advice.

Having a “blameless” Post-Mortem process means that engineers whose actions have contributed to an accident can give a detailed account of:

  • what actions they took at what time,
  • what effects they observed,
  • expectations they had,
  • assumptions they had made,
  • and their understanding of timeline of events as they occurred.

…and that they can give this detailed account without fear of punishment or retribution.

I was struck, as John described the process, by the frequency of “how” compared to “why” (emphasis added):

  • The goal is to understand how an accident could have happened, in order to better equip ourselves from it happening in the future
  • We enable and encourage people who do make mistakes to be the experts on educating the rest of the organization how not to make [mistakes] in the future.
  • We strive to make sure that the blunt end of the organization understands how work is actually getting done.
  • In order to understand how failures happen, we first have to understand our reactions to failure.
  • One option is to assume the single cause is incompetence and scream at engineers to make them “pay attention!” or “be more careful!”
    Another option is to take a hard look at how the accident actually happened, treat the engineers involved with respect, and learn from the event.

While “why” is an excellent question to help understand causation, chains of events, and motivation, asking “how” can help uncover systemic issues.

Better still, asking “how” can also help you understand what needs to be done to defang those issues. Knowing how a situation can turn out badly can give you the answer to what needs to be changed so that similar situations will have better outcomes in the future. I’ve found that once we understand how a mishap came to occur, it is often immediately clear to everyone what needs to be guarded against. It’s even fairly frequent to have a good idea at how such a safeguard can be implemented.

So, next time you are involved in a root cause analysis, by all means, keep on asking those five whys.

But don’t forget to ask “how?” a few times, too!

Related posts:

Root Cause Analysis Reports

Root Cause Analysis: Ask Five Why’s

Root Cause Analysis: Asking Five Why’s

06 Thursday Mar 2014

Posted by fullerbecker in Uncategorized

≈ Leave a comment

Tags

Root Cause Analysis

There’s a technique used in Root Cause Analysis called “5 Why’s.”

The idea is, you define the problem, and ask why that problem occurred. Get the answer to that question. Then ask why THAT happened. And repeat, until you’ve gone at least 5 levels deep.

This trivial example comes from the link above:

You are on your way home from work and your car stops.

  • Why did your car stop? Because it ran out of gas.
  • Why did it run out of gas? Because you didn’t buy any gas on my way to work.
  • Why didn’t you buy any gas this morning? Because you didn’t have any money.
  • Why didn’t you have any money? Because you lost it all last night in a poker game.
  • Why did you lose all your money in a poker game? Because you’re no good at bluffing.

Ok, that’s a fairly trivial example. Sometimes things really are that simple – or even simpler, you might only need two or three why’s to get to the true root cause. (If you find yourself referring to universal constants or physical laws — “because f = ma” — you probably have gone far enough.)

But sometimes things aren’t so linear. Sometimes there is more than one answer to a why. Sometimes, there are multiple why’s to ask of a because.

Five isn’t a magic number; it’s simply to encourage you not to stop at the first, second, or even third level of analysis. Forcing you to go five levels deep should make you go beyond the obvious.

These sorts of complications often arise when doing root cause analysis of complex systems, such as multi-tier software applications.

Don’t be discouraged. And don’t get hung up on finding a tidy linear flow of the questions. Just keep asking why. For example, let’s imagine a root cause analysis for an application that stopped responding:

  • Why did the application stop responding? Because it couldn’t write to the disk.
  • Why couldn’t the application write to the disk? Because there was no free disk space.
  • Why was there no free disk space? Because a process was crashing repeatedly, creating a large dump file each time.

Now, there are several questions which can be asked here. And sometimes, a single question can have multiple answers!

Why was the process crashing? The following could all be valid answers in the same case.

  • Because the programmer made a mistake that wasn’t caught by the compiler.
  • Because the programmer made a mistake that wasn’t caught by code review.
  • Because the environment the software was running in did not conform to assumptions made in the solution design.
  • Because the instance had not upgraded to a newer version of the software which included a fix for this bug.

And each of these could in turn have multiple why’s. That’s OK. Don’t get wrapped around the axle. Generate a bunch of these!

If you are having a hard time coming up with why’s, consider the system from a different perspective. If you were looking at it from a system administrator’s perspective, consider it from the point of view of:

  • an end user
  • an infosec engineer – or an intruder
  • a business stakeholder who doesn’t use the software but relies on people who do
  • a tech support engineer
  • a developer

Using these different perspectives can also be helpful if you are only generating why’s that lead to insights which can’t be practically acted upon – for example, because they are outside the scope of control of any of the involved parties.

If you do successfully generate a large number of why’s, it can be helpful to arrange them in a fishbone or tree diagram to show how they relate to each other.

sample fishbone root cause analysis diagram

Sample fishbone root cause diagram (aka Ishikawa diagram) from Fishbonediagram.org

Related posts:

Root Cause Analysis Reports

Root Cause Analysis: Don’t just ask “Why?” — also ask “How?”

Newer posts →

Subscribe

  • Entries (RSS)
  • Comments (RSS)

Archives

  • June 2019
  • January 2019
  • August 2017
  • July 2017
  • June 2017
  • April 2017
  • January 2016
  • June 2015
  • February 2015
  • December 2014
  • June 2014
  • April 2014
  • March 2014

Categories

  • Uncategorized

Meta

  • Register
  • Log in

Create a free website or blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy