In the process of adding our new Nonlinear Regression analysis to Minitab 16, I had the opportunity to learn a lot about it.

As you probably noticed, the field of statistics is a strange beast. Need more evidence? Linear regression can produce curved lines and nonlinear regression is not named for its curved lines.

So, when should you use Nonlinear Regression over one of our linear methods, such as General Regression, Best Subsets, or Stepwise Regression?

Generally speaking, you should try linear regression first. It’s easier to use and easier to interpret. However, if you simply aren’t able to get a good fit with linear regression, then it might be time to try nonlinear regression.

Let’s look at a case where linear regression doesn’t work. Often the problem is that, while linear regression can model curves, it might not be able to model the specific curve that exists in your data. The graphs below illustrate this with a linear model that contains a cubed predictor.

The fitted line plot shows that the raw data follow a nice tight function and the R-squared is 98.5%, which looks pretty good. However, look closer and the regression line systematically over and under-predicts the data at different points in the curve. If you check the residuals plots (which you *always* do, right?), you see patterns in the Residuals versus Fits plot, rather than the randomness that you want to see. This indicates a bad fit, but it’s the best that linear regression can do.

Let’s try it again, but using nonlinear regression. It's important to note that because nonlinear regression allows a nearly infinite number of possible functions, it can be more difficult to setup. In this case, it required considerable effort to determine the function that provided the optimal fit for the specific curve present in these data, but since my main point is to explain when you want to use nonlinear regression instead of linear, we don't need to relate all of those details here. (Just like on a cooking show, on the blog we have the ability to jump from the raw ingredients to a great outcome in the graphs below without showing all of the work in between!)

The fitted line plot shows that the regression line follows the data almost exactly -- there are no systematic deviations. It’s impossible to calculate R-squared for nonlinear regression, but the S value (roughly speaking, the average absolute distance from the data points to the regression line) improves from 72.4 (linear) to just 13.7 for nonlinear regression. You want a lower S value because it means the data points are closer to the fit line. What's more, the Residual versus Fits plot shows the randomness that you want to see. It’s a good fit!

Nonlinear regression can be a powerful alternative to linear regression but there are a few drawbacks. In addition to the aforementioned difficulty in setting up the analysis and the lack of R-squared, be aware that:

• The effect each predictor has on the response can be less intuitive to understand.

• P-values are impossible to calculate for the predictors.

• Confidence intervals may or may not be calculable.

If you're using Minitab 16 now, you can play with this data yourself by going to **File -> Open Worksheet**, then click on the **Look in Minitab Sample Data folder **icon and choose **Mobility.MTW**. These data are the same that I’ve used in the Nonlinear Regression Help example in Minitab 16, which contains a fuller interpretation of the Nonlinear Regression output.

If you'd like to try it, you can download the free 30-day trial of Minitab 17 Statistical Software. If you're learning about regression, read my regression tutorial!

Time: Monday, February 17, 2014

Why it it impossible to calculate R-squared for nonlinear regression, while EXCEL does calculate the R-Squared

Time: Thursday, February 20, 2014

Hi Nabil,

That's a very timely question. In a couple of weeks I'll publish a blog post about this very topic. So, in the mean time, I'll provide a brief explanation.

For linear models, the sums of the squared errors always add up in a specific manner: SS Regression + SS Error = SS Total.

This seems quite logical. The variability that the regression model accounts for plus the error variability add up to equal the total variability. Further, R-squared equals SS Regression / SS Total, which mathematically must produce a value between 0 and 100%.

In nonlinear regression, SS Regression + SS Error does not equal SS Total! This completely invalidates R-squared for nonlinear models, which no longer has to be between 0 and 100%

It's true that some software packages calculate R-squared for nonlinear regression. However, academic studies have shown that this approach is invalid. Using R-squared to evaluate nonlinear models will generally lead you astray. You don't want this! That's why Minitab doesn't offer R-squared for nonlinear regression.

Instead, compare S values, and go with the smaller values.

Again, check back in a couple of weeks for a complete post about this!

Thanks for reading and the great question!

Jim

Time: Monday, March 17, 2014

Hi Jim,

So can I conclude that a regression model with high R-sq(adj) does not mean that the model is accurate? How do I determine if the regression model is reliable despite a low R-sq(adj)?

Reagrds,

Shasha

Time: Friday, March 21, 2014

Hi Shasha,

A high adjusted R-squared (or even the regular R-squared) doesn't necessarily mean that the model is a good fit. You should always check the residual plots to be sure that the model is not biased. If the residual plots look good, then you can trust the goodness-of-fit measures, such as R-squared and adjusted R-squared.

You would have different interpretations of a low adjusted R-squared depending on how it compares to your regular R-squared.

If the regular is high and the adjusted is low, you probably have too many predictors in your model.

Read this post for more details about using adjusted R-squared:

http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables

If both types of R-squared are low, it's not necessarily bad if you have significant predictors and your residual plots are good. However, it depends on what you want to do with your model.

Read this blog post for more details about this scenario:

http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis

Jim