I’ve written a number of blog posts about regression analysis and I've collected them here to create a regression tutorial. I’ll supplement my own posts with some from my colleagues.

This tutorial covers many aspects of regression analysis including: choosing the type of regression analysis to use, specifying the model, interpreting the results, determining how well the model fits, making predictions, and checking the assumptions. At the end, I include examples of different types of regression analyses.

If you’re learning regression analysis right now, you might want to bookmark this tutorial!

## Why Choose Regression and the Hallmarks of a Good Regression Analysis

Before we begin the regression analysis tutorial, there are several important questions to answer.

Why should we choose regression at all? What are the common mistakes that even experts make when it comes to regression analysis? And, how do you distinguish a good regression analysis from a less rigorous regression analysis? Read these posts to find out:

## Tutorial: How to Choose the Correct Type of Regression Analysis

Minitab statistical software provides a number of different types of regression analysis. Choosing the correct type depends on the characteristics of your data, as the following posts explain.

## Tutorial: How to Specify Your Regression Model

Choosing the correct type of regression analysis is just the first step in this regression tutorial. Next, you need to specify the model. Model specification consists of determining which predictor variables to include in the model and whether you need to model curvature and interactions between predictor variables.

Specifying a regression model is an iterative process. The interpretation and assumption verification sections of this regression tutorial show you how to confirm that you’ve specified the model correctly and how to adjust your model based on the results.

• How to Choose the Best Regression Model: I review some common statistical methods, complications you may face, and provide some practical advice.
• Stepwise and Best Subsets Regression: Minitab provides two automatic tools that help identify useful predictors during the exploratory stages of model building.
• Curve Fitting with Linear and Nonlinear Regression: Sometimes your data just don’t follow a straight line and you need to fit a curved relationship.
• Interaction effects: Michelle Paret explains interactions using Ketchup and Soy Sauce.
• Proxy variables: Important variables can be difficult or impossible to measure but omitting them from the regression model can produce invalid results. A proxy variable is an easily measurable variable that is used in place of a difficult variable.
• Overfitting the model: Overly complex models can produce misleading results. Learn about overfit models and how to detect and avoid them.
• Hierarchical models: I review reasons to fit, or not fit, a hierarchical model. A hierarchical model contains all lower-order terms that comprise the higher-order terms that also appear in the model.
• Standardizing the variables: In certain cases, standardizing the variables in your regression model can reveal statistically significant findings that you might otherwise miss.
• Five reasons why your R-squared can be too high: If you specify the wrong regression model, or use the wrong model fitting process, the R-squared can be too high.

## Tutorial: How to Interpret your Regression Results

So, you’ve chosen the correct type of regression and specified the model. Now, you want to interpret the results. The following topics in the regression tutorial show you how to interpret the results and effectively present them:

## Tutorial: How to Use Regression to Make Predictions

In addition to determining how the response variable changes when you change the values of the predictor variables, the other key benefit of regression is the ability to make predictions. In this part of the regression tutorial, I cover how to do just this.

• How to Predict with Minitab: A prediction guide that uses BMI to predict body fat percentage.
• Predicted R-squared: This statistic indicates how well a regression model predicts responses for new observations rather than just the original data set.
• Prediction intervals: See how presenting prediction intervals is better than presenting only the regression equation and predicted values.
• Prediction intervals versus other intervals: I compare prediction intervals to confidence and tolerance intervals so you’ll know when to use each type of interval.

## Tutorial: How to Check the Regression Assumptions and Fix Problems

Like any statistical test, regression analysis has assumptions that you should satisfy, or the results can be invalid. In regression analysis, the main way to check the assumptions is to assess the residual plots. The following posts in the tutorial show you how to do this and offer suggestions for how to fix problems.

• Residual plots: What they should look like and reasons why they might not!
• How important are normal residuals: If you have a large enough sample, nonnormal residuals may not be a problem.
• Multicollinearity: Highly correlated predictors can be a problem, but not always!
• Heteroscedasticity: You want the residuals to have a constant variance (homoscedasticity), but what if they don’t?
• Box-Cox transformation: If you can’t resolve the underlying problem, Cody Steele shows how easy it can be to transform the problem away!

## Examples of Different Types of Regression Analyses

The final part of the regression tutorial contains examples of the different types of regression analysis that Minitab can perform. Many of these regression examples include the data sets so you can try it yourself!