Data Analysis and the Mystery of the Confounded Calcium
In my previous blog post, I showed how omitting a confounding predictor from a linear regression model obscured the significance of another predictor variable. Confounding variables can be insidious because you don’t always know about them, and you may have to deduce their existence.
In that vein, this post is like a mystery story. I’ll set up the mystery and include the clues. You put on your Sherlock Holmes cap, use your knowledge of confounding variables, and see if you can come up with your own theories about how one or more confounding predictors are most likely involved.
For this longitudinal research project, we wanted to determine whether an exercise intervention boosted the treatment group’s bone density compared to a control group. Our subjects were junior high school aged girls who were randomly assigned to either the treatment or control group. We collected a lot of other data along the way, including self-reported 3-day diet records several times a year. We wanted to see if their average daily calcium intake influenced their bone densities. It turned out that calcium intake was not associated with anything we recorded, much less with bone density.
Here’s how the difficulties started. The exercise intervention involved jumping from a height of 24 inches, 30 times every other school day. Eventually, some of the subjects started to report pain in their knees. The phys ed teacher who administered the intervention during gym class indicated that her students would typically complain and that she’d just push them onwards.
As responsible researchers, we felt compelled to stop the intervention for awhile for those with pain. After all, the intervention was designed to produce more intense impacts than the subjects would normally experience (6 times their body weight). We debated among ourselves whether the self-reported accounts of pain indicated real problems or were more related to the typical reluctance to perform physical activity that some students exhibited. Clearly, we needed to study it.
As the data/stats guy, I wanted to see if there were patterns amongst those who experienced pain. We had already recorded a lot of data about these subjects. Did those who experience pain tend to be associated by: height, weight, activity levels, reaching menarche, etc.? To look for these patterns, I used Minitab Statistical Software to run all sorts of analyses, including correlation analysis, hypothesis testing, and binary logistic regression.
After all that, only one variable was significantly associated with the self-reported pain: calcium intake. Those with a higher calcium intake appeared to have a lower chance of developing knee pain. Huh? The variable that wasn’t associated with anything before was now associated with this out-of-the-blue thing? Was calcium really preventing knee pain?
I’ve put in enough clues that you should be able to develop theories about what is happening. My next blog post will focus on the theories that we generated.