# Giving Thanks for the Regression Menu

Juicy, butter roasted turkey.

Steaming mashed potatoes.

Tangy cranberry relish.

Delicious candied sweet potatoes.

Creamy green bean casserole.

Sweet and airy corn bread.

Silken pumpkin pie.

The traditional Thanksgiving menu has so many mouth-watering dishes on the table, you don’t know where to start.

If you savor statistics as much as food, you might feel similarly as you gaze at all of the delicious analyses on Minitab’s Regression menu:

How can you decide which regression analysis to choose? In this post, I’ll give you some bite-sized samples of each regression dish to help you decide which one to heap on your plate.

## Regression with a Categorical Response

Ever notice the horizontal divider lines on the Minitab menus? Those are not just to prevent our programmers from writing crooked. Think of them as the separators on a paper plate that keep your cranberry sauce from running into your mashed potatoes, so they don’t mix together and turn pink.

For example, the line at the bottom of the menu neatly separates all the regression analyses that have a categorical response variable.

To use one of these analyses, each response in your data must fall into a separate category. Choose between them based on what kind of categorical response you have. Here’s a concrete scenario:

**Binary Logistic Regression**: The response falls into one of two categories.

**Example:** You track whether each person took an antacid after Thanksgiving dinner or not (Yes or No)

The odds of a person taking an antacid increases, on average, 2.35 times with each helping of candied sweet potatoes.

__________________________________________________________

**Ordinal Logistic Regression**: The categories of your response can be ordered from least to greatest.

**Example:** You record how many belt buckle notches each person expanded their belt after dinner (0-4 notches)

On average, each additional helping of mashed potatoes results in a 36% increase in the odds that you will expand your belt by another notch.

__________________________________________________________

**Nominal Logistic Regression**: The categories of your response do not follow an order.

**Example:** You ask each person which of these animals they most feel like after Thanksgiving dinner: A beached whale, a sea elephant, or an anaconda after swallowing a wild pig.

With each additional serving of turkey, people are 4.37 times more likely to feel like an anaconda after swallowing wild pig than a beached whale.

__________________________________________________________

## Regression with a Continuous Response

The line at the top of Minitab's Regression menu neatly separates regression analyses that use a continuous response variable.

To use one of these analyses, each response must be a measurement on a continuous scale, such as length, weight, or time.

__________________________________________________________

**Regression**: You have one or more continuous predictors, and a continuous response.

**Example:** You track how many minutes each person spends laying belly-up on the living room floor after Thanksgiving dinner.

Each additional helping of chestnut stuffing results in an increase of 4.28 minutes, on average, of laying belly-up on the living room floor after dinner (when the servings of all of the other dishes are held constant).

__________________________________________________________

**General Regression**: You have a mix of categorical and continuous predictors, and a continuous response.

**Example:** Besides the continuous predictors for the helpings of each dish, your model for belly-up time also includes a categorical predictor to indicate whether each person ate snacks before the Thanksgiving meal (Yes or No).

Eating snacks before the Thanksgiving meal increases the time spent bellyup on the floor by about 17 minutes, on average, when the helpings of all of the other dishes are held constant.

__________________________________________________________

**Stepwise Regression**: Minitab identifies a useful subset of predictors based on the statistical significance of the predictors (using stepwise, forward selection, or backward elimination)

**Example**: You want Minitab to tell you which dishes have a statistically significant effect on the number of minutes people spend belly-up on the floor after dinner.

Of the 7 dishes on the table, Minitab determines that chestnut stuffing and corn bread are the statistically significant predictors for evaluating time spent belly-up on the floor.

__________________________________________________________

**Best Subsets Regression**: Minitab identifies a useful subset of predictors based on how much variation the model explains (the maximum R-squared criterion).

**Example:** You want Minitab to tell you which combination of dishes explains most of the variation in the number of minutes people spend belly-up on the floor after dinner.

Of the 7 predictors in the model, cranberry sauce (CR) chestnut stuffing (CS), and corn bread (CB), explain the most variation in time spent belly-up on the floor. Adding more predictors (dishes) doesn’t increase the R-squared value significantly.

__________________________________________________________

**Fitted Line Plot:** Display a fitted line and perform regression and for only one continuous predictor and a continuous response.

**Example:** You want to visualize the association between servings of stuffing and time spent belly-up on the floor.

There’s a weak but statistically significant quadratic association between servings of stuffing and time spent belly-up on the floor.

__________________________________________________________

**Nonlinear Regression:** Specify a nonlinear function to model the relationship between continuous predictors and a continuous response.

**Example:** Uncle Alfred, a brilliant Ph.D. chemist, has conducted experiments on the chemical properties of chestnuts and their effect on metabolic enzyme reactions that induce fatigue. Based on his research, he knows he can model the relationship between servings of chestnut stuffing and post-prandial prostrate position using a Gompertz growth curve with three parameters.

Uncle Alfred’s theoretical exponential function of a negative exponential function describes the relationship between chestnut stuffing and belly-up time. However, Uncle Alfred is the only one who understands his complex nonlinear model. When tries to explain it after dinner, everyone falls asleep on the floor. Belly-up.

## Specialized Regression Analyses

Two analyses on the Regression menu each form their own category. These analyses model a continuous response and continuous predictors, but their applications are specialized. ** **

**Orthogonal Regression: **Test whether two instruments or methods provide comparable measurements.

**Example:** Grandma got a new digital turkey thermometer for a gift, but she’s suspicious that it doesn’t work as well as her trusty old metal thermometer. Before Thanksgiving, she tests it by using both thermometers to measure the temperature in a pot of water as she chills it in the fridge or heats on the stove, recording the temperature measured on each instrument.

Despite Grandma’s suspicions, the digital thermometer is equivalent to her tried-and-true metal thermometer. (The confidence interval for the slope includes 1 and the confidence interval for the constant includes 0.)

__________________________________________________________

**Partial Least Squares Regression: **You have few observations relative to the number of predictors, or your predictors are highly associated with each other, making a standard regression analysis problematic.

**Example: **Suppose your Thanksgiving study sampled only 10 subjects, instead of 100 subjects, but still included all 7 predictor variables. Your small sample caused high standard error for the coefficient estimates. Also, the same people who took many helpings of chestnut stuffing also took many helpings of mashed potatoes and turkey, and had similar responses with more helpings (*zzzzzzz...*.) causing these predictors to be correlated.

By using a partial least squares model with 6 components, each formed by taking a linear combination of the predictor variables, you can explain about 84% of the variation in belly-up time. Adding another component doesn’t increase R-Sq much.

__________________________________________________________

Whew! Are you feeling stuffed after sampling all that regression? It may take awhile to digest everything on that menu. If it's too much, I suggest you loosen your belt a notch and lay on the floor for a while.

Belly-up.

Name: Alex• Monday, November 19, 2012Splendid! I have only one question for this cool blog on regression: is it really possible in MINITAB to mark statistically significant predictors in color automatically?

Name: Patrick• Monday, November 19, 2012Hey Alex,

Thanks for reading and asking that perceptive question.

Sorry if my post gave the misimpression that the significant results in the Minitab output were highlighted automatically. I used the ReportPad in Minitab to edit the Session window output and change the font color of the significant results to red for the blog post—so readers of the post could quickly hone in on the key results.

Currently Minitab does not automatically change the color of results based on statistical significance. One of the issues with automatically formatting statistically significant results directly in the Session window is that the highlighting would depend on the alpha level used for the analysis. For regression, alpha levels for keeping predictors in the model can range from 0.01 to 0.25, depending on the application. Also, depending on the application, a researcher may decide that certain predictors are important to keep in the model, (i.e., are “significant”), regardless of their p-values, because of prior process knowledge. So automatic formatting of significant results would delight some users, but annoy others. Hence the ReportPad, which allows each user to format and edit the output according to his or her needs (and tastes—some folks detest using red font to highlight important results!)

In general, users who want more automated output and guided interpretation use the Minitab Assistant. There is a regression analysis in the Assistant, but currently it’s designed for a simple model with only one continuous X and one continuous Y.

Thanks again for your comment—and Happy Thanksgiving!

Name: Luana• Tuesday, August 20, 2013Minitab is my favorite tool for statistics! It is helping me a lot with regression, and I have a question.....

Is there a way to automatically calculate R^2 in nonlinear regression?

Name: Patrick• Tuesday, August 20, 2013Hi Luana,

Glad to hear Minitab is your favorite tool for regression and other statistical analyses!

You asked a great question about R-squared and nonlinear regression.

R-squared is not calculated for nonlinear regression in Minitab. The reason for this is that nonlinear regression does not have the specific properties of linear regression that make the R-squared calculation accurate and appropriate.

Here's a peer-reviewed paper that demonstrates why R-squared is not considered appropriate for nonlinear regression: http://www.biomedcentral.com/1471-2210/10/6

When evaluating the fit for nonlinear regression models, use prior knowledge of the shape or behavior of the response based on known physical or chemical/biological properties of the application.

In Minitab’s nonlinear regression output, you can use the fitted line plot to visually examine how the nonlinear model fits the data across the range of values. You can also compare the value of S, the standard deviation of the residuals, for different nonlinear models. Smaller values of S (closer to 0) indicate a better fit.

Thanks for the question. If you're interested in understanding this issue in more detail, check my future posts. I want to tackle this topic in more detail, with a longer explanation that uses visuals for clarity.

Name: Luana• Thursday, September 12, 2013Thanks a lot! The S value was helpful to evaluate my models ^^

Name: ruthshaw• Monday, December 9, 2013Hi

I am using PLSR, and I can't understand how to generate forecasts into the future. It seems to suggest that I should enter data into the future and that it will overwrite the response variable column, because it doesn't allow one to enter an empty column that one wants forecasts to be placed in... can you give a dummy step by step instruction set to make forecasts based on the analysis, using the exact menu options that appear?

Name: Patrick• Monday, December 9, 2013Hi,

If I understand your comment correctly, you're trying to use your Partial Least Squares Regression model to predict new observations. To do that, in Minitab choose Stat > Regression > Partial Least Squares. Make sure your model is entered in the main dialog box, then click the Prediction button. Now enter the values or columns of values in the continuous and/or categorical predictor fields for which you want to predict the response. For guidance on how to enter the values, click the Help button on the Partial Least Squares: Prediction dialog box.

If the Help topic doesn't answer your question, or if I'm not understanding your comment correctly, then you might want to contact Minitab Technical Support by phone or email for more assistance:

http://www.minitab.com/en-US/support/

Hope this helps. Good luck!