Variability can make things difficult whether you are performing data analysis for a quality improvement initiative or for an academic study. Recently, I detailed how variability reduces your statistical power. As promised, this will help you solve a mystery.

One of the many things I love about research are the unexpected mysteries. You get to be Sherlock Holmes! When you’re exploring the unknown, you’re bound to run into surprising results that you can’t explain at first. The clues are often buried in the data that you’ve already collected. You just need to put the pieces together.

The following is a case where one mystery leads to another!

**Scenario**

For a study that I’ve written about before, we wanted to determine if a simple exercise intervention, implementable in a typical physical education class, could increase bone density in adolescent girls. We randomly assigned 100 middle-school-aged girls to treatment and control groups and tracked them over a couple of years. The treatment group jumped from 24-inch steps 30 times every other school day. The control group did non-weight-bearing stretching during these times. We measured their bone density using a Hologic DXA whole-body system.

**Results!**

I compared the rate of change between DXA measurements for both groups. Unfortunately, the study proceeded for several years with no difference between the treatment and control groups. The treatment was not affecting bone density. Finally, a significant difference of 2.2% appeared! Hurray! The treatment seemed to be affecting bone density growth!

**Mystery 1**

Now that we had a significant result for one measurement period, we were very eager to see what results we might see in the subsequent measurement period. After analyzing the subsequent DXA measurement, the first thing that I noticed was that the mean difference between the 2 groups was a nearly identical percentage. That sounded good! That difference was significant last time, so it should be this time because we had the same subjects. I ran a paired t-test in Minitab because we were studying the change from the beginning to the end values of this measurement period. This time, I was disappointed to find that it was an insignificant difference!

This was sad news, but what happened? Conditions in both periods were largely the same: we observed the same difference between the groups, which had the same subjects. Yet in one period the difference was significant and in the other it was not.

**Mystery 1 Solved!**

Readers of my previous blog post can cash in their free bonus now! I looked to see how the variability changed from one period to the next and found that the variability increased. This gave the analysis for the latter period less statistical power even though the difference remained constant. If you haven’t read the previous post, it illustrates how this happens. However, this led to the second mystery...

**Mystery 2**

While tracking the variability over time to solve mystery #1, I noticed an interesting pattern. The variability oscillated up and down in a consistent pattern.

The zero line represents no difference between the groups. The intervals represent the standard deviation. You can see the alternating pattern. The even-numbered periods have about a third less variability than the odd-numbered periods. Period 4 is when we got our significant results. Period 5 had about the same difference but the larger standard deviation wiped out the significance (the confidence interval would include zero if it were graphed).

What caused this regular pattern? I didn’t think it could be an actual physical phenomenon, because that would imply that the subjects’ bones were synchronized in an alternating pattern of more and less consistent rates. It’s inconceivable that this level of bone coordination was going on! Much more likely, this was an artifact of something related to data collection or analysis. Uh-oh, that was my area!

**Mystery 2 Solved!**

While mystery 1 was solvable with the clues in the previous post, mystery 2 is not. I’ll present the answer but I wouldn't expect you could figure it out with the information presented. Instead, I present mystery 2 to illustrate why you should always understand the correlation and variability within your data, several themes that I always push!

There are two keys to solving this mystery:

• Data collection intervals

• Correlation structure within the data

We used the DXA machine to measure all of the subjects twice a year. However, for various practical reasons, measurements did not occur at 6-month intervals. The DXA measurements occurred in January and May. And I assessed the growth rates between DXA measurements. So the change from January to May occurred over 4 months while the change from May to January occurred over 8 months. I converted all rates to annual rates so that I could directly compare them. Worried that my conversion was incorrect, I checked...but the calculations were correct. However, the different lengths of the intervals played a role because the higher variability *always *occurred during the shorter periods.

There is an interesting correlation between the bone growth rates. At the time of this study, this age group hadn’t been measured extensively by DXA machines. We knew it was the peak bone growth time of their lives and we did find rapid density increases. However, I also found that there was an almost exactly zero correlation between one measurement period and the next. This zero correlation means that future growth is not predictable based on previous growth. So, we had random bursts of bone growth.

The answer to the mystery lies in combining these two keys: random explosive spurts falling within measurement periods of different lengths. During the shorter periods, not all of the subjects underwent explosive spurts. Some did, but others didn’t, which accounts for the greater variability. But during the longer periods, *all *of the subjects experienced high growth spurts, which produced less variable growth rates. The longer time frame provides a better chance to average out the rates and reduce the number of values that fall further away from the average.

It’s sort of like flipping coins. If you flip a coin 4 times and count the heads, you would expect a higher percentage of extreme values above and below the mean. However, if you flip a coin 100 times, you would expect a lower percentage of extreme values.

**Lessons Learned**

For this case, the results were all accurate and no errors were made. Sometimes the lessons that you learn are applicable mainly to future studies. In this case, I recommended keeping at least an 8-month gap between DXA measurements for this age group!

Generally speaking, understanding the data more thoroughly provides a deeper understanding of what is happening. So be sure to solve your own data mysteries—don’t let these things slide, because they often reveal something that you don’t know, but need to.