My past several posts have detailed confounding variables, a problem you might encounter in research or quality improvement projects.
To recap, confounding variables are correlated predictors. Leaving a confounding variable out of a statistical model can make an included predictor look falsely insignificant or falsely significant. In other words, they can totally flip your statistical analysis results on its head!
To find lurking confounding variables, you must take the time to understand your data and the important variables that may influence a process. Background research and solid subject-area knowledge can help you navigate data difficulties. You should also measure and include everything that you think is important.
Of course, understanding and measuring everything of importance may not be possible due to time and cost constraints. Indeed, all of the relevant variables may not be known or even measurable. What to do?
There is a simple solution to this complex problem. You can wave the white flag and admit that you don’t know everything, or at least that you can’t measure everything that affects your response. You randomize!
Randomness plays several important roles in the design of experiments. In this case, we’re talking about random assignment, which is different than random selection.
- Random selection is how you draw the sample for your study. This allows you to make unbiased inferences about the population based on your sample.
- Random assignment is how you assign the sample to the control and treatment groups in your experiment. This allows you to make causal conclusions about the effect of one variable on another variable.
Random assignment might involve flipping a coin, drawing names out of a hat, or using random numbers. All subjects should have the same probability of being assigned to any group. This process helps assure that the groups are similar to each other when treatment begins. Therefore, any post-study differences between groups shouldn’t be due to prior differences.
Let’s work through an example and see how it combats confounding variables. Take the biomechanics study where we wanted to see if the jumping exercise (treatment group) produced greater bone density than the group that didn’t jump (control group). Further, let’s assume that greater physical activity is correlated with increased bone density but we didn’t measure it. We’ll compare 2 scenarios.
Scenario 1: We don’t use random assignment and, unbeknownst to us, the more physically active subjects end up in the treatment group. The treatment group starts out more active than the control group. Because activity increases bone density, the higher activity in the treatment group may account for the greater bone density compared to the less active control group. Because it is not in the model, activity is a confounding variable that makes the jumping exercise appear to be significant when it might not be.
Scenario 2: We use random assignment so the treatment and control groups start out with roughly equal levels of physical activity. Activity still affects bone density but it is equally spread across the groups. Indeed, the groups are roughly equal in all ways except for the jumping exercise in the treatment group. If the treatment group has a significantly higher bone density, it’s almost certainly due to the jumping exercise.
For both scenarios, the data and statistical results could be identical. However, the results for the second scenario are more valid thanks to the methodology.
Random assignment helps protect you from the perils of confounding variables and competing explanations. However, you can’t always implement random assignment. For the bone density study, we did randomly assign the subjects to the treatment or control group. However, when I used the data from that study to look for patterns amongst the subjects who developed knee pain, I couldn’t randomly assign them to higher and lower calcium intake groups! This highlights one of the pitfalls of ad hoc data analysis.
We’ve detailed the negative aspects of confounding variables here and in my last several posts. However, confounding variables have a potential upside. They don’t sound quite so threatening when you think of them as proxy variables, which we’ll cover in my next post.
Time: Thursday, March 1, 2012
Jim,
Thank you very much, for this post and the series about randomness-counfounding variables and related topics.
The way you explain the topics is clear, easy-to-understand, and I have found it very, very valuable.
I hope more students, professors and enthusiasts take these posts as reference.
Understanding the importance of randomness is key, fundamental.
I like to think, explain, that randomness selection and assigment are fundamentally mechanisms we use to try to "catch" as much variation we can, to allow samples to "experience" the different sources of variable we think affect the phenomenom and the sources we don´t.
I will be sharing and referencing part of your post with our fellow Central American Minitab-enthusiasts, via http://improve.blackberrycross.com
Greetings from Blackberry&Cross in Costa Rica.
Time: Tuesday, March 20, 2012
Hi Omar, thanks for reading and the nice comments!
DOE in general is an intesting combination of carefully planning the details and carefully planned randomness!
Sincerely,
Jim Frost