Statisticians say the darndest things. At least, that's how it can seem if you're not well-versed in statistics.
When I began studying statistics, I approached it as a language. I quickly noticed that compared to other disciplines, statistics has some unique problems with terminology, problems that don't affect most scientific and academic specialties.
For example, dairy science has a highly specialized vocabulary, which I picked up when I was an editor for Penn State's College of Agricultural Sciences. I found the jargon fascinating, but not particularly confusing to learn. Why? Because words like "rumen" and "abomasum" and "omasum" simply don't turn up in common parlance. They have very specific meaning, and there's little chance of misinterpreting them.
Now open up a statistics text and flip to the glossary. There are plenty of statistics-specific terms, but you're going to see a lot of very common words as well. The problem is that in statistics, these common words don't necessarily mean what they do outside statistics.
And that means that if you're not well versed in statistics, or even if it's just been a while since you thought about it, understanding statistical results—whether it's a research report on the news or an analysis done by a co-worker—can be a real challenge. Sometimes it seems like the language of statistics was designed to be confusing.
That's one of the reasons we incorporated The Assistant into Minitab Statistical Software. This interactive menu guides you through your analysis and presents your results without ambiguity, making them easy to interpret if you aren't a statistician, and making them easy to share if you are one.
Here are 10 common words that are also routinely used in statistics. Those of us who are practicing data analysis and sharing the results with others need to keep in mind the differences between what these words mean to statisticians, and what they mean to everyone else.
When most people say something is "significant," they mean it's important and worth your attention. But for statisticians, significance refers to the odds that what we observe is not simply a chance result. Statisticians know that on a practical level, significant results often have no importance at all. This distinction between practical and statistical significance is easy for people to overlook.
Normally, people who say something is "normal" mean that it's ordinary or commonplace. We call a temperature of 98.6 degrees Fahrenheit "normal." What's more, when something isn't "normal," it often carries negative connotations: "That knocking from my car's engine isn't normal." But to statisticians, data is “normal” when it follows the familiar bell-shaped curve, and there's nothing wrong with data that isn't normal. But it's easy for the uninitiated to conflate "nonnormal data" with "bad data."
In everyday usage, regression means shrinkage or backwards movement. When the dog you're training has a bad day after a few positive ones, you might say his behavior regressed. Unless you're a statistician, you wouldn't immediately think "regression" refers to predicting an output variable based on input variables.
In statistics, the arithmetic average (or mean) is the sum of the observations divided by the number of observations. When most people hear and say the word "average," they're not thinking about a mathematical value but rather a qualitative judgment, meaning “so-so,” "normal" or "fair."
Error is a measure of an estimate’s precision—if you're a statistician. To everyone else, errors are just mistakes.
In statistics, bias refers to the accuracy of measurements taken by a particular tool or gauge compared to a reference value. In everyday usage, however, bias refers to preconceptions and prejudices that affect a person's view of the world.
For most people who aren't statisticians, residuals is a fancy word for leftovers, not the difference between observed and fitted values.
Usually we talk about power in terms of impact and control. Influence. So the fact that a statistical test can be powerful but not influential seems contradictory, unless you already know it refers to the probability of finding a...um...significant effect when one truly exists.
People use this word to talk about their communications with others. For statisticians, it means the effects of one factor are dependent on another.
In statistics, the confidence interval is a range of values, derived from a sample, that is likely to hold the true value of a population parameter. The confidence level is the percentage of confidence intervals that contain that population parameter you would get if you sampled the population many times.
Outside of its technical meaning in statistics, the word "confidence" carries an emotional charge that can instantly create unintended implications. All too often, people interpret statistical confidence as meaning the researchers really believe in their results.
These 10 terms are just a few of the most confusing double-entendres found in the statistical world. Terms like sample, assumptions, stability, capability, success, failure, risk, representative, and uncertainty can all mean different things to the world outside our small statistical circle.
Making an effort to help the people we communicate with appreciate the technical meanings of these terms as we use them would be an easy way to begin promoting higher levels of statistical literacy.
What do you think the most confusing terms in statistics are?