When you work in data analysis, you quickly discover an irrefutable fact: a lot of people just can't stand statistics. Some people fear the math, some fear what the data might reveal, some people find it deadly dull, and others think it's bunk. Many don't even really know why they hate statistics—they just do. Always have, probably always will.
Problem is, that means we who analyze data need to communicate our results to people who aren't merely inexperienced in statistics, but may have an actively antagonistic relationship with it.
Given that situation, which side do you think needs to make concessions?
The Difference Between Brilliance and Bull
When I was a kid, I used to see this T-shirt in the Spencer Gifts store in the Monroeville Mall: "If you can't dazzle them with brilliance, baffle them with bull---." I thought it was an amusing little saying then, but as I've grown older I've realized an underlying truth about it:
In terms of substance, brilliance and bull often are identical.
Whether what you're saying is viewed as one or the other depends on how well you're getting your ideas across. Given the ubiquity of the "Lies, damned lies, and statistics" quip, it would seem that most statistically-minded people aren't doing that well. We so often forget that most of the people we need to reach just don't get statistics. But when we do that, we're putting another layer on the Tower of Babel.
In sharing the results of an analysis with people who aren’t statistics-savvy, we have two obligations. First, to make a concerted effort to convey clearly the results of our analysis, and its strengths and limitations. Most of us do this to some degree. But second, we should take every opportunity we can to demystify and humanize statistics, to help people appreciate not just the complexity but also the art that goes into analysis. To promote statistical literacy. I think most of us can do better in this regard.
Opening the Black Box
There is an impression among those not well versed in statistical methods that the discipline is something of a black box: statisticians know the magic buttons that will transform a spreadsheet full of data into something meaningful.
A good statistician knows the formulas and methods inside out, and very smart ones expand the discipline with new techniques and applications. But an effective statistician is sensitive to the relationship between the language of statistics and the language the audience speaks, and able to bridge that gap.
Statisticians who are trying to communicate about their work with the uninitiated are like ambassadors: they need to be completely cognizant of local knowledge, customs, and beliefs, and present their message in a way that will be understood by the recipients.
In other words, unless we're speaking to a room full of other statisticians, we should stop talking like statisticians.
What We Mean Is Not Necessarily What We Say
The language of statistics can seem particularly impenetrable and obtuse. That's hard to deny, given that the method we use to compare means is called “Analysis of Variance.” And when it comes to distributions, right-skewed data are clustered on the left side of a bar graph and left-skewed data clustered on the right. Not exactly intuitive. That's why the Assistant in Minitab Statistical Software uses plain language in its output and dialog boxes, and avoids confusing statistical jargon.
Indeed, some statistical language can seem like outright obfuscation, like the notion that a statistical test "failed to reject the null hypothesis." From an editorial viewpoint, "failing to reject the null hypothesis" would seem a needlessly circular equivalent to the word accept.
Of course from a statistical perspective, replacing "failure to reject" with "accept" would be very wrong. So we’re left with a phrase that’s precise, correct, and also confusing. It takes only seconds to compare "failing to reject the null" to a jury saying "not guilty." When evidence against the accused isn’t convincing, that doesn’t prove innocence. But how often is "failure to reject the null" presented to lay audiences with no explanation?
Another difficulty with statistical language, ironically, is that it includes so many common words. Unfortunately, their meanings in statistics are not the same as their common connotations, so when we use them in a statistical context, we often connote unintended ideas. Consider just a few of the terms that mean one thing to statisticians, and quite another to everyone else.
- Significant. For most, this word equates to "important." Statisticians know that significant things may have no importance at all.
- Normal. People take this to mean it something is ordinary or commonplace, not that it follows a Gaussian distribution.
- Regression. To "regress" is to shrink or move backwards. Most people won’t relate that idea to estimating an output variable based on its inputs.
- Average. People hear this not as a mathematical value but as a qualitative judgment, meaning common or fair.
- Error. Statisticians mean the measure of an estimate's precision, but people hear "mistake."
- Bias. For statisticians, it doesn't mean attitudinal prejudice, but rather the accuracy of a gauge compared to a reference value.
- Residual. People think residuals are leftovers, not the differences between observed and fitted values.
- Power. A statistical test can be powerful without being influential. Seems like a contradiction, unless you know it refers to the probability of finding a significant (there we go again…) effect when it truly exists.
- Interaction. An act of communication for most, rather than effects of one factor being dependent on another.
- Confidence. This word carries an emotional charge, and can leave a non-statistical audience thinking statistical confidence means the researchers really believe in their results.
And the list goes on...statistical terms like sample, assumptions, stability, capability, success, failure, risk, representative, and uncertainty can all mean different things to the world outside the statistical circle.
Statisticians frequently lament the common misunderstandings and lack of statistical awareness among the population at large, but we are responsible for making our points clear and complete when we reach out to non-statistical audiences—and almost any audience we reach will have a non-statistical contingent.
Making an effort to help the people we communicate with appreciate the technical meanings of these terms as we use them is an easy way to begin promoting higher levels of statistical literacy.