Most Lean Six Sigma projects use data analysis to examine and reduce the prevalence of defects in a product or service. Experienced quality practitioners know that defects can sometimes feel abstract when you analyze them, until you’re the customer who experiences them firsthand.
On my first trip to Rome, my luggage never showed up in the terminal.
So instead of gazing in awe at the Sistine Chapel, or savoring tortellini ricotta and a glass of chianti, I spent my first day in the Eternal City trying to get sales clerks in cheap discount stores to understand my botched pronunciations of the Italian words for toothpaste (dentifricio), socks (calzini) and underwear (biancheria intima).
To help protect air travelers from having to speak broken Italian when they shop for undergarments, the U.S. Department of Transportation publishes a monthly report on flight delays, mishandled baggage, and other related air travel statistics.
Suppose you want to use statistical software to analyze that data to compare mishandled baggage among U.S. airlines for a given month. Your first inclination might be to simply compare the number of mishandled bags per airline, such as the bar chart below.
What’s wrong with this picture? Why don’t counts tell the whole story?
If you raised an eyebrow and thought, “Hey, that’s not a fair comparison,” you’re absolutely right. After all, larger airlines that handle more passenger volume are more likely to have more mishandled bags simply because they handle more baggage. If you look at the chart, sure enough, most of the larger airlines are on the far right.
In fact, one major airline had about 24,000 reports of mishandled luggage in one month. That sounds astronomical. But in relation to number of passengers, its rate of mishandled luggage was only about 0.0028, less than 3 reports for every 1000 passengers. A much better performance.
Now look what happens when the mishandled baggage is represented as a rate of the number of mishandled bags per 1000 passengers.
Notice the larger airlines now fare much better. For example, US Airways was in the bottom 5 for mishandled baggage based solely on frequency counts (top chart). But taking into account passenger volume (bottom chart) it’s now in the top 5, with one of the lowest rates of mishandled baggage for the month.
You might think it’s an obvious point, yet misleading charts of frequency counts are very common in newspapers and magazines, especially in the ubiquitous “top 10” and “top 5” lists that appear in many blogs, such as:
How might you adjust data in these sets of rankings to better compare job availability or the potential for financial gain in cities?
So whenever you have frequency data, think carefully about whether it’s more appropriate to display and compare the data as a relative frequency or as a rate. If it’s continuous data, a proportion or a percentage might make more sense.
Of course, if you’re the customer with lost luggage, like I was, whether your mishandled bag is represented as a count or a rate doesn’t matter much when you don’t have any fresh socks. (Uffa!)