Collecting Good Data: It's a Messy World, Confound it!

When we collect data for quality improvement projects, we try to identify and measure all the factors that could influence our outcome.  In scientific research studies, we try to neatly organize, classify, and measure all relevant aspects of the subject matter in order to quantify the relationship between all significant variables.

The problem is that reality can be messy and hard to measure, and you may not know all of the significant variables. You need a good plan based on subject area research to get it right and measure the right variables. Read on to see why and what can happen if you don’t include all of the relevant variables!X-ray of skeleton

In a bio-mechanics study that I was involved in, we explored the effects of physical activity on bone density. We measured a number of variables, including the subjects’ overall physical activity and their bone densities. Theory suggests that subjects with higher activity should have a higher bone density because the bones adapt to the higher forces.

When I took a quick, preliminary look using a regression analysis where activity was the predictor and bone density was the response, I found that activity was not significant. I was expecting to see a positive correlation but instead found that activity didn’t seem related to bone density at all.

What was going on?

It turns out that we didn’t include an important variable: the subject’s weight. This was just a preliminary look at the data and we fully expected and measured other significant variables. However, this illustrates the problem of looking at a multivariate research question with too few variables.

The results for the variables you include can be biased by the significant variables that you don’t include.
In my next post, I'll explain in detail just how leaving weight out of my preliminary regression analysis obscured the effect of physical activity on bone density!


Name: Omar Mora • Tuesday, November 22, 2011

Dear Jim,
I´m reading your posts: simple awesome!
Your statement: "The results for the variables you include can be biased by the significant variables that you don’t include" is powerfull.
I´m working in a DOE (fractional factorial, 2-to-5, resolution III) for e-marketing, in which we try to identify the most signficant factors in increasing the response rate to an e-mail/e-marketing campaign, and we have face what you stated.
Knowing the process and "mapping" the relationships is hard. We are know using Quality Companion template to help us with this.
Again, great post. Omar from blackberrycross.com

blog comments powered by Disqus