Which Is Better, Stepwise Regression or Best Subsets Regression?
Stepwise regression and best subsets regression are both automatic tools that help you identify useful predictors during the exploratory stages of model building for linear regression. These two procedures use different methods and present you with different output.
An obvious question arises. Does one procedure pick the true model more often than the other? I’ll tackle that question in this post.
First, a quick refresher about the two procedures and their different results:
- Stepwise regression presents you with a single model constructed using the p-values of the predictor variables
- Best subsets regression assess all possible models and displays a subset along with their adjusted R-squared and Mallows' Cp values
The key benefit of the stepwise procedure is the simplicity of the single model. Best subsets does not pick a final model for you but it does present you with multiple models and information to help you choose the final model. For more details, read this post where I compare stepwise regression to best subsets regression and present examples using both analyses.
Determining the Better Model Selection Method
A study by Olejnik, Mills, and Keselman* compares how often stepwise regression, best subsets regression using the lowest Mallows' Cp, and best subsets using the highest adjusted R-squared selects the true model.
The authors assessed 32 conditions that differed by the number of candidate variables, number of authentic variables, sample size, and level of multicollinearity. For each condition, the authors created 1,000 computer-generated data sets and analyzed them with both stepwise and best subsets to determine how often each procedure selected the correct model.
And, the winner is...stepwise regression!! Congratulations! Well, sort of, as we’ll see.
Best subsets regression using the lowest Mallows' Cp is a very close second. The overall difference between Mallows' Cp and stepwise selection is less than 3%. The adjusted R-squared performed much more poorly than either stepwise or Mallows' Cp.
However, before we pop open the champagne to celebrate stepwise regression’s victory, there’s a huge caveat to reveal.
Stepwise selection usually did not identify the correct model. Gasp!
Digging into the Results
Let’s look at the results more closely to see how well stepwise selection performs and what affects its performance. I’ll only cover stepwise selection, but the results for Mallows' Cp are essentially tied and follow the same patterns. I’ll give my thoughts on the matter at the end.
In the results below, stepwise regression identifies the correct model if it selects all of the authentic predictors and excludes all of the noise predictors.
Best case scenario
In the study, stepwise regression performs the best when there are four candidate variables, three of which are authentic; there is zero correlation between the predictors; and there is an extra-large sample size of 500 observations. For this case, the stepwise procedure selects the correct model 84% of the time. Unfortunately, this is not a realistic scenario and the accuracy diminishes from here.
Number of candidate predictors and number of authentic predictors
The study looks at scenarios where there are either 4 or 8 candidate predictors. It is harder to choose the correct model when there are more candidates simply because there are more possible models to choose from. The same pattern holds true for the number of authentic predictors.
The table below shows the results for models with no multicollinearity and a good sample size (100-120 observations). Notice the decrease in the percent correct as both the number of candidates and number of authentic predictors increase.
|Candidate predictors||Authentic predictors||% Correct model|
The study varies multicollinearity to determine how correlated predictors affect the ability of stepwise regression to choose the correct model. When predictors are correlated, it’s harder to determine the individual effect each one has on the response variable. The study set the correlation between predictors to 0, 0.2, and 0.6.
The table below shows the results for models with a good sample size (100-120 observations). As correlation increases, the percent correct decreases.
|Candidate predictors||Authentic predictors||Correlation||% Correct model|
The study uses two sample sizes to see how that influences the ability to select the correct model. The size of the smaller samples is calculated to achieve 0.80 power, which amounts to 100-120 observations. These sample sizes are consistent with good practices and can be considered a good sample size.
The very large sample size is 500 observations and it is 5 times the size that you need to achieve the benchmark power of 0.80.
The table below shows that a very large sample size improves the ability of stepwise regression to choose the correct model. When choosing your sample size, you may want to consider a larger sample than what the power and sample size calculations suggest in order to improve the variable selection process.
|Candidate predictors||Authentic predictors||Correlation||% Correct - good sample size||% Correct - very large sample|
Stepwise regression generally can’t pick the true model. This is true even with the small number of candidate predictors that this study looks at. In the real world, researchers often have many more candidates, which lowers the chances even further.
Reality is complex and we should not expect that an automated algorithm can figure it out for us. After all, the stepwise algorithm follows simple rules and it knows nothing about the underlying process or subject area. However, stepwise regression can get you to right ballpark. At a glance, you’ll have a rough idea of what is going on in your data.
It’s up to you to get from the rough idea to the correct model. To do this, you’ll need to use your expertise, theory, and common sense rather than relying solely on simplistic model selection rules.
For tips about how to do this, read my post Four Tips on How to Perform a Regression Analysis that Avoids Common Problems.
If you're learning about regression, read my regression tutorial!
*Stephen Olejnik, Jamie Mills, and Harfey Keselman, “Using Wherry’s Adjusted R2 and Mallows' Cp for Model Selection from All Possible Regressions”, The Journal of Experimental Education, 2000, 68(4), 365-380.