Under pressure to conserve energy, industrial companies are turning to data analytics to identify the main sources of energy consumption. To reduce energy utilization and save money, a potential solution is to analyze consumption data at each factory station.
Adjusting variables like machine speed, throughput, temperature, or equipment can make a big difference for a manufacturer. The key is to be as efficient as possible while maintaining overall quality, and one of the ways to do that is by using stepwise regression.
When Is Stepwise Regression Appropriate?
Stepwise regression is an appropriate analysis when you have many variables and you’re interested in identifying a useful subset of the predictors. In Minitab, the standard stepwise regression procedure both adds and removes predictors one at a time. Minitab stops when all variables not included in the model have p-values that are greater than a specified Alpha-to-Enter value and when all variables that are in the model have p-values that are less than or equal to a specified Alpha-to-Remove value.
In addition to the standard stepwise method, Minitab offers two other types of stepwise procedures:
- Forward selection: Minitab starts with no predictors in the model and adds the most significant variable for each step. Minitab stops when all variables not in the model have p-values that are greater than the specified Alpha-to-Enter value.
- Backward elimination: Minitab starts with all predictors in the model and removes the least significant variable for each step. Minitab stops when all variables in the model have p-values that are less than or equal to the specified Alpha-to-Remove value.
Stepwise Regression Example
In this example of using stepwise regression to identify the major sources of energy usage, analysts from the manufacturing plant considered the following predictor variables: total units produced, total equipment run time, staff size, mean outside temperature, minimum outside temperature, maximum outside temperature, percentage of sun, and mean equipment age. However, it’s important to note that stepwise regression can become especially helpful if you have over 100+ predictor variables!
Their goal was to narrow these variables into a list of the top predictors of energy usage. To get a final model, analysts chose Stat > Regression > Regression > Fit regression model in Minitab Statistical Software and completed the dialog box by entering the response ‘Energy’ and the above list of predictors in the continuous predictors field as shown in the screenshot below.
Click Stepwise in the dialog box and complete the sub dialogue box as shown below.
They were presented with the following model that included the predictors of total equipment run time, max temp, and average equipment age. Minitab removed the other variables because their p-values were greater than the ‘Alpha-to-Enter’ value.
To access the residual charts, select CTRL E to recall the last dialog box you filled in, click on Graphs and in the sub dialogue box, tick Pareto and under Residual Charts, select Four in One as shown below.
The regression equation below indicates that energy usage increases as total equipment run time, maximum temperature, and average equipment age increase:
Total equipment run time has the largest impact according to the T-statistics. Maximum temperature is second, followed by average equipment age.
With this analysis, the analysts were able to conclude that energy usage is significantly higher due to the extensive air conditioner usage, and that newer equipment appears to reduce energy usage. The plant might want to limit running equipment during peak times where air conditioning use is consistent and consider purchasing new equipment before the summer season.
Would you like to take your regression and data analytics skills to the next level?
Pitfalls of Stepwise Regression
While a lot can be learned with stepwise regression, there are some potential pitfalls to be aware of:
- If two independent variables are highly correlated, only one may end up in the model even though both may be important.
- Because the procedure fits many models, it could be selecting models that fit the data well due to chance alone
- Stepwise regression may not always end with the model with the highest R2 value possible for a given number of predictors.
- Automatic procedures cannot take into account special knowledge the analyst may have about the data. Therefore, the model selected may not be the most practical one.
- Graphing individual predictors against the response is often misleading because graphs do not account for other predictors in the model.
If you'd like to work with this data set yourself, download the data on Scribd.
Are hoping to achieve better energy efficiency throughout your organization? Watch our on-demand webinar to explore how optimized processes can increase efficiency, enhance equipment and material usage, and lower costs.