In response to a previous post, a reader asked about the best way to represent variation in data. The reader was specifically asking about error bars, which are sometimes displayed on graphs as standard deviation (SD), standard error (SE), or confidence intervals (CIs).
I'd like to broaden the implications of this excellent question. What does each estimate of variation tell you—and what doesn’t it tell you?
Kick back, put on your headphones, and pour yourself some tea. Let's explore this topic using one of the world's most popular and enduring musical formats—the theme and variations.
(If you're in a hurry, you can skip the music and scroll down to the Coda section to get a quick summary.)
____________________________________________________
If you’ve ever played or listened to variations on a theme, you know that the piece typically starts with a simple, basic statement of the theme. This unadorned motif is very easy to hear and understand.
To represent variation in your data, the simplest form is the range: the maximum data value minus the minimum data value.
Before statistical software like Minitab, the range was utilized much more than it is now, because it was so quick and easy to calculate. It still comes in handy, at times. But it has serious limitations.
For example, based on only the range, you’d conclude that the four data sets below exhibit the same variation.
That leaves a lot unsaid, doesn’t it?
So there’s a need to elaborate on this simple theme.
____________________________________________________________
One big drawback of the range is that it doesn’t tell you how your data are scattered in relation to their central value—the mean.
The variance, however, is designed to do just that. To measure the spread of the data about the mean, it calculates the difference between each data point and the mean and then squares that difference.
The formula for the sample variance is shown on the right, if it helps. (If it just triggers indigestion or heartburn, feel free to ignore it.)
Now, let’s examine the variances of the four data sets that all had the same range.
See how the variance reflects the differences in scatter that the range couldn’t? The variance is smallest for group D (26.4), which makes sense because its values are more tightly clustered about the mean. The variance is largest for groups B and C, whose data points are scattered farther from the mean.
So what's the down side? Because the variance is calculated using squared values, it can get quite large. That makes it difficult to understand intuitively—especially because it’s expressed in squared units.
For example, if these data represent the number of pounds people lost on a four different diets, you’d say the variance of the weight loss on diet B is 142.6 pounds2.
What the heck does "142 squared pounds" mean? Ever gain or lose a square pound?
We need a more intuitive way to represent the variation about the mean...
____________________________________________________________
Here’s where the music gets sublime. Because with that tiny tweak, some beautiful things happen.
First off, the value of the SD shows about much the data deviate, on average (figuratively—it’s not a true average), from the mean, using the same units as the data.
For example, error bars were added to the plot below to show the calculated standard deviation for each group. Each bar represents 1 SD in each direction from the mean.
Now, if these data represent the number of pounds people lost on a different diets, you can say the mean weight loss on diet D was about 22 pounds, with weight losses in the group deviating from that, on average, by roughly 5 pounds.
Much more intuitive to interpret, isn’t it?
What’s more, if your data are normally distributed, you get a “special bonus” when you order the SD: You can easily estimate the percentage of data values that fall within each unit of SD.
No wonder this variation of variation is so popular!
____________________________________________________________
Suppose your study focuses on estimating a parameter, such as the mean of a population, rather than simply describing the general variability of your data.
To quantify how precise or how certain your estimate of the mean is, you need a statistic that tells you how much sampling variability affects that estimate.
Again, feel free to ignore the formula, other than to note that a larger sample size (n) will decrease the SE, while a larger standard deviation will increase the SE. This reflects the greater confidence you have in your estimate as you collect more data or as your data exhibit less variability.
For diet C, the tiny sample size and large variability produces a whopping SE. Now that’s a mean estimate you just can’t trust! You'd be much wiser to put your money on the mean estimate for diet D (21.9), with its very tight SE bars.
The SE is very useful, but it can feel a bit abstract to people who aren't statisticians. Suppose you report that the patients on diet D lost a mean of 21.9 pounds, with a SE of 0.923. The clinical implications of that SE probably won't resonate with many dieticians--and certainly won't make any sense to patients!
____________________________________________________________
So how can you communicate the precision of your estimate to a wider audience?
Did you say, "Multiply the SE by a t-value associated with a level of confidence that corresponds with a given probability from a t distribution?"
Yes, indeed! That mathy maneuver gives you a very practical measure of precision: the confidence interval (CI).
Now you can interpret the certainty of the mean estimates more intuitively. For example, based on the 95% CI for diet D, you can say that you're 95% confident that the mean weight loss for the entire population of diet D dieters falls between 20.1 and 23.9 pounds. The CIs for diets A and B are wider, so those mean estimates of weight loss are somewhat less certain.
For diet C, note that we can't assert with 95% confidence that the mean change in weight is actually a weight gain, a weight loss, or no change in weight at all (0)! That makes perfect sense because two data values is hardly a sufficient sample to base an estimate on (Note: the full CI isn't shown on the plot because I wanted to keep the Y scale consistent with the other plots).
Another perk of the CI: you can use it to directly evaluate your results in relation to a benchmark value.
Suppose you know from previous studies that to provide clinical benefits for this population of patients, weight loss must be at least 15 pounds. Then you cannot be 95% confident that diets A, B, or C will provide clinical benefit, because the lower limits of those confidence intervals are all less than 15. You can, however, assert with 95% confidence that diet D provides clinical benefit, because its lower confidence limit is greater than 15.
_______________________________________________________________
Are you still awake? Have you nodded off and started blowing bubbles in your tea yet?
Let's quickly wrap things up with the Coda, which means "tail" in Italian. This section is sometimes tacked onto the end of a musical piece as a short, zippy summary and finale.
Here's the short answer to the reader's original question:
Always label error bars clearly. The SEM is always less than the SD, so you want to avoid giving the misimpression that it represents the general variability of your data. Similarly, for sufficiently large samples, the 95% CI is about twice as large as the SE for the mean, so always indicate which one you're showing. (Minitab does this by default on the interval plot.)
Of course, your choice of how to represent error also depends on the conventions in your field of research, as well as the specific guidelines and audience of the journal you're publishing in. When in Rome...
For additional guidelines on using error bars in scientific publications, check out this helpful article in the Journal of Cell Biology.
(Thanks to the reader for his great question!)