"There are three kinds of lies: lies, damned lies, and statistics."
I’m sure you’ve heard this most vile expression, which was popularized by Mark Twain among others. This dastardly phrase impugns the reputation of statistics. The implication is that statistics can bolster a weak argument, or that statistics can be used to prove anything.
I’ve had enough of this expression, and here’s the rebuttal! In fact, I’ll make the case that statistics is not the problem, but the solution!
First, let’s stipulate that an unscrupulous person can intentionally manipulate the results to favor unwarranted conclusions. Further, honest analysts can make honest mistakes because statistics can be tricky.
However, that does not mean that the field of statistics is to blame!
An analogy is in order here. If a surgeon does not follow best practices, intentionally or not, we don’t blame the entire field of medicine. In fact, when a mistake happens, we call on medical experts to understand what went wrong and to fix it. The same should be true with statistics. If an analyst presents unreliable conclusions, there is no one better qualified than a statistician to identify the problem and fix it!
So, what is the field of statistics, and why is it so important?
The field of statistics is the science of learning from data. Statisticians offer essential insight in determining which data and conclusions are trustworthy. Statisticians know how to solve scientific mysteries and how to avoid traps that can trip up investigators.
When statistical principles are correctly applied, statistical analyses tend to produce accurate results. What’s more, the analyses even account for real-world uncertainty in order to calculate the probability of being incorrect.
To produce conclusions that you can trust, statisticians must ensure that all stages of a study are correct. Statisticians know how to:
Statisicians should be a study's guide through a minefield of potential pitfalls, any of which could produce misleading conclusions. The list below is but a small sample of these pitfalls.
Biased samples: A non-random sample can bias the results from the beginning. For example, if a study uses volunteers, the volunteers collectively may be different than non-volunteers in a way that affects the results.
Overgeneralization: The results from one population may not apply to another population. A study that involves one gender or age group, may not apply to other groups. Statistical inferences are always limited and you need to understand the limitations.
Causality: How do you know when X causes a change in Y? Statisticians require tight criteria in order to assume causality. However, people in general accept causality more easily. If A precedes B, and A is correlated with B, and you know several people who say that A affects B, most people would assume, incorrectly, that data show a causal connection. Statisticians know better!
Incorrect analysis choices: Is the model too simple or too complex? Does it adequately capture any curvature that is present? Are the predictors confounded or overly correlated? Do you need to transform your data? Are you analyzing the mean when the median may be a better measure? There are many ways you can perform analyses, but not all of them are correct.
Violation of the assumptions for an analysis: Most statistical analyses have assumptions. These assumptions are often requirements about the type of sample, the type of data, and how the data (or residuals) are distributed. If you perform an analysis without checking the assumptions, you cannot trust the results even if you’ve taken all the measures necessary to collect the data properly.
Data mining: Even if everything passes muster, an analyst can find significant results simply by looking at the dataset for too long. If a large number of tests are performed, a few will be significant by chance. Fastidious statisticians keep track of all the tests that are performed in order to put the results in the proper context.
In short, there are many ways to screw up and produce misleading conclusions. Once again, you have to get all of the stages correct or you can’t trust the conclusions.
If you want to use data to learn how the world works, you must have this statistical knowledge in order to trust your data and your results. There’s just no way around it. Even if you are not performing the study, understanding statistical principles can help you assess the quality of other studies and the validity of their conclusions. Statistical knowledge can even help reduce your vulnerability to manipulative conclusions from projects that have an agenda.
The world today produces more data than ever before. This includes all branches of science, quality improvement, manufacturing, service industry, government, public health, and public policy among many other settings. There will be many analyses of these data. Some analyses are straight up for science and others are more partisan in nature. Are you ready? Will you know which conclusions to trust and which studies to doubt?
In addition to resources like this blog, Minitab offers an e-learning course called Quality Trainer that can help you learn statistical principles, particularly as they relate to quality improvement. If you'd like to learn more about analyzing data, it's a great investment at just $30 per month, or possibly less if your organization uses Minitab Statistical Software.