It's all too easy to make mistakes involving statistics. Powerful statistical software can remove a lot of the difficulty surrounding statistical calculation, reducing the risk of mathematical errors -- but correctly interpreting the results of an analysis can be even more challenging.
A few years ago, Minitab trainers compiled a list of common statistical mistakes, the ones they encountered over and over again. Being somewhat math-phobic myself, I expected these mistakes would be primarily mathematical. I was wrong: every mistake on their list involved either the incorrect interpretation of the results of an analysis, or a design flaw that made meaningful analysis impossible.
Statistical Mistake 1. Not Distinguishing Between Statistical Significance and Practical Significance
Let's say you love Tastee-O's cereal. The factory that makes them weighs every cereal box at the end of the filling line using an automated measuring system. Say that 18,000 boxes are filled per shift, with a target fill weight of 360 grams and a standard deviation of 2.5 grams.
Using statistics, the factory can detect a shift of 0.06 grams in the mean fill weight 90% of the time. But just because that 0.06 gram shift is statistically significant doesn't mean it's practically significant. A 0.06 gram difference probably amounts to two or three Tastee-O's -- not enough to make you, the customer, notice or care.
In most hypothesis tests, we know that the null hypothesis is not exactly true. In this case, we don’t expect the mean fill weight to be precisely 360 grams -- we are just trying to see if there is a meaningful difference. Instead of a hypothesis test, the cereal maker could use a confidence interval to see how large the difference might be and decide if action is needed.
Statistical Mistake 2. Stating That You've Proved the Null Hypothesis
In other words, even if we do not have enough evidence in favor of the alternative hypothesis, the null hypothesis may or may not be true.
For example, we could flip a fair coin 3 times and test:
Statistical Mistake 3. Assuming Correlation = Causation
But while it's tempting to observe the linear relationship between two variables and conclude that a change in one is causing a change in the other, that's not necessarily so -- statistical evidence of correlation is not evidence of causation.
Consider this example: data analysis has shown a strong correlation between ice cream sales and murder rates. When ice cream sales are low, the murder rate is low. When ice cream sales are high, the murder rate is high.
So could we conclude that ice cream sales lead to murder? Or vice versa? Of course not! This is a perfect example of correlation not equaling causation. Yes, the murder rate and ice cream sales are correlated. In the summer months, both are high. In the winter months, both are low. So when you think beyond the correlation, the data suggest not that the murder rate and ice cream sales affect each other, but rather that both are affected by another factor: the weather.
If you've ever misinterpreted the significance of a correlation between variables, at least you've got company: the media is rife with examples of news stories that equate correlation and causation -- especially when it comes to the effects of diet, exercise, chemicals and other factors on our health!
Have you ever jumped to the wrong conclusion after looking at statistics?