Explaining Quality Statistics So Your Boss Will Understand: Pareto Charts

I once had a boss who had difficulty understanding many, many things. When I need to discuss statistical concepts with people who don't have a statistical background, I like to think about how I could explain things so even my old boss would get it.

My boss and I shared a common interest in rock and roll, so that's the device I'll use to explain one of the workhorses of quality statistics, the Pareto chart. I'd tell my boss to imagine that instead of managing a surly gang of teenaged restaurant employees, he's managing a surly rock and roll band, the Zero Sigmas. The band did a 100-date tour last year, and before going on the road again, he wants to see what were the most frequent mishaps, in hopes things might run a bit more smoothly. He's got a table of data, but it's a little difficult to figure out what the raw numbers mean.

He needs to create a Pareto chart with that data. It's a very straightforward tool, but therein lies the danger: Because it looks like a standard bar chart, the Pareto chart can be misinterpreted. A well-intentioned (but statistics-impaired) boss may take one look at it and, assuming it's a regular old run-of-the-mill bar chart, imagine he's got it all figured out without actually thinking about what he's seeing. He'll want to make sure he really understand what the Pareto chart reveals.

What Does a Pareto Chart Do?

In my boss's defense, there's really not much difference between a Pareto chart and your regular old run-of-the-mill bar charts, except that the Pareto chart ranks your defects (or whatever it is you're measuring) from largest to smallest.

From a quality improvement perspective, this is important because it can help you identify which quality problems are the most critical in terms of volume, expense, or other factors. Once you've prioritized your challenges, you can focus improvement efforts where they'll have the largest benefits.

Organizations tend to use Pareto charts in one of two ways.

Use 1: Determine the most common type of defect.

Use 2: Identify projects with the greatest potential returns or benefits.

In quality-speak, we say the Pareto chart separates the "vital few" problems from the "trivial many." In other words, it gives you an easy way to visualize which problems have the biggest impact on your organization.

When you look at the data in a Pareto chart, you might find out, for instance, that even though there's a perception that customers complain more frequently about, say, shipping speed, your greatest volume of complaints is really about the voicemail system. Knowing that can help you tackle the problem that's most important to the most customers first.

In keeping with the rock-and-roll theme, the Pareto chart will help us see which incidents on last year's tour kept the Zero Sigmas from rocking audiences to the fullest.

Setting Up Data for the Pareto Chart

When you create a Pareto chart in our statistical software, your data must include the names of each defect. These names can be text or numeric. If your data are summarized in a table, you must include a column of frequencies or counts, with nonnegative numeric values for each defect.

Let's say you've identified and tallied 9 types mishaps that occurred with some regularity during last year's tour. You can arrange the data in a Minitab worksheet like this:

Minitab worksheet of 9 types of mishaps

To create a chart that shows the frequencies of these incidents graphically, we just select Stat > Quality Tools > Pareto Chart and enter Incident as our Defects data and Count as our Frequencies data. Minitab produces the following graph:

Pareto Chart of Incident

The right Y-axis shows the percent of the total mishaps accounted for by each type of incident, while the left Y-axis shows the count of those incidents. The red line indicates cumulative percentage, which can help you judge the added contribution of each category. The bars of show the count (and the percentage of total) for each category. Below the bars, the counts, percents, and cumulative percents are listed for each incident category.

You'll notice the last grouping is labeled "Other." Your raw data didn't include an "Other", but by default Minitab puts all categories with counts that represent less than 5% of the total defect count into this "Other" category.

In this example, 27.9% of the incidents involved the Zero Sigmas starting their gig late, which they did every single night of the tour. Another 22.3 percent of incidents involved the band's singer, Hy P. Value, forgetting the lyrics to his own songs. The combined, or cumulative, percentage for starting late and forgetting lyrics is 50%, and if you add in the guitars going out of tune, you've accounted for a whopping 67.9% of the incidents that plagued the band's 100-day tour.

In terms of overall numbers of incidents, it looks like these are the three areas you should focus on if you want the Zero Sigmas to kick out the jams more efficiently on the next tour. This illustrates how you would create a Pareto chart for the first use above: to determine the most commonly occurring types of defects.

Limitations of the Pareto Chart

Although the Pareto chart is easy to create, understand and use, it does have some limitations:

Data collected over a short time period, especially from an unstable process, may lead to incorrect conclusions. If the data's not reliable, you could get an incorrect picture of the distribution of defects. For example, while on tour Hy P. Value got laryngitis and was caught lip-syncing three times in the same week. Had you only looked at that week's worth of data, you'd have a distorted picture of how frequently lip-syncing incidents occurred throughout the tour. Just remember that the "vital few" problems can change frequently, and that short periods may not accurately represent your process as a whole.
Data gathered over long periods may include changes made to the process, so it's a good idea to see if you've got stratification or changes in the distribution over time.
If your initial Pareto analysis does not yield useful results, make sure you selected meaningful categories of defects. You also should make sure your "other" category is not too large.
A Pareto analysis is designed to help you get the biggest bang for your quality improvement buck, but it doesn't give you permission to ignore small, easily solved problems that can be fixed while you're working on the bigger issues.
Focusing on the areas of greatest frequency should decrease the total number of incidents (or defects). Focusing on the areas of greatest impact should increase the overall benefits of improvement.

About that last bullet...this example looks at the overall counts of incidents that happened on the Zero Sigma's recent tour. But how do we know those are the incidents that had the biggest impact on the tour's overall rock-and-roll awesomeness? That's exactly the kind of great question my old boss would never have thought to ask, and it's the question I'll answer in my next post.