dcsimg

Why Is There No R-Squared for Nonlinear Regression?

Nonlinear regression is a very powerful analysis that can fit virtually any curve. However, it's not possible to calculate a valid R-squared for nonlinear regression. This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for nonlinear regression, some other packages do.

So, what’s going on?

Minitab doesn't calculate R-squared for nonlinear models because the research literature shows that it is an invalid goodness-of-fit statistic for this type of model. There are bad consequences if you use it in this context.

Why Is It Impossible to Calculate a...

Opening Ceremonies for Bubble Plots and Poisson Regression

By popular demand, Release 17 of Minitab Statistical Software comes with a new graphical analysis called the Bubble Plot.

This exploratory tool is great for visualizing the relationships among three variables on a single plot.

To see how it works, consider the total medal count by country from the recently completed 2014 Olympic Winter Games. Suppose I want to explore whether there might be a possible association between the number of medals a country won and its maximum elevation. For that, I could use a simple scatterplot, right?

But say I want to throw a third variable into the mix, such as...

Unleash the Power of Linear Models with Minitab 17

We released Minitab 17 Statistical Software a couple of days ago. Certainly every new release of Minitab is a reason to celebrate. However, I am particularly excited about Minitab 17 from a data analyst’s perspective. 

If you read my blogs regularly, you’ll know that I’ve extensively used and written about linear models. Minitab 17 has a ton of new features that expand and enhance many types of linear models. I’m thrilled!

In this post, I want to share with my fellow analysts the new linear model features and the benefits that they provide.

New Linear Model Analyses in Minitab 17

We’ve added...

Regression Analysis: How to Interpret S, the Standard Error of the Regression

R-squared gets all of the attention when it comes to determining how well a linear model fits the data. However, I've stated previously that R-squared is overrated. Is there a different goodness-of-fit statistic that can be more helpful? You bet!

Today, I’ll highlight a sorely underappreciated regression statistic: S, or the standard error of the regression. S provides important information that R-squared does not.

What is the Standard Error of the Regression (S)?

S becomes smaller when the data points are closer to the line.

In the regression output for Minitab statistical software, you can find...

How High Should R-squared Be in Regression Analysis?

Just how high should R2 be in regression analysis? I hear this question asked quite frequently.

Previously, I showed how to interpret R-squared (R2). I also showed how it can be a misleading statistic because a low R-squared isn’t necessarily bad and a high R-squared isn’t necessarily good.

Clearly, the answer for “how high should R-squared be” is . . . it depends.

In this post, I’ll help you answer this question more precisely. However, bear with me, because my premise is that if you’re asking this question, you’re probably asking the wrong question. I’ll show you which questions you should...

Fix Problems in Regression Analysis with Partial Least Squares

Face it, you love regression analysis as much as I do. Regression is one of the most satisfying analyses in Minitab: get some predictors that should have a relationship to a response, go through a model selection process, interpret fit statistics like adjusted R2 and predicted R2, and make predictions. Yes, regression really is quite wonderful.

Except when it’s not. Dark, seedy corners of the data world exist, lying in wait to make regression confusing or impossible. Good old ordinary least squares regression, to be specific.

For instance, sometimes you have a lot of detail in your data, but not...

Regression Analysis Tutorial and Examples

I’ve written a number of blog posts about regression analysis and I think it’s helpful to collect them in this post to create a regression tutorial. I’ll supplement my own posts with some from my colleagues.

This tutorial covers many aspects of regression analysis including: choosing the type of regression analysis to use, specifying the model, interpreting the results, determining how well the model fits, making predictions, and checking the assumptions. At the end, I include examples of different types of regression analyses.

If you’re learning regression analysis right now, you might want to...

See How Easily You Can Do a Box-Cox Transformation in Regression

For one reason or another, the response variable in a regression analysis might not satisfy one or more of the assumptions of ordinary least squares regression. The residuals might follow a skewed distribution or the residuals might curve as the predictions increase. A common solution when problems arise with the assumptions of ordinary least squares regression is to transform the response variable so that the data do meet the assumptions. Minitab makes the transformation simple by including the Box-Cox button. Try it for yourself and see how easy it is!

The government in Queensland,...

How Data Analysis Can Help Us Predict This Year's Champions League

by Laerte de Araujo Lima, guest blogger

A few weeks ago, my football friends and I were talking about the football in the UEFA Champions league (UEFA CL), and what we could expect for the 2013-14 season.

Some of us believe that the quality of the football played in the UEFA CL has improved in the last few years, as evidenced by more goals per match, more teams with strategies based in the attack and, finally, more show games. Others disagree, arguing that the teams were pursued defensive strategies with consequently fewer goals per match, more faults per game, and less effective use of game time...

Applied Regression Analysis: How to Present and Use the Results to Avoid Costly Mistakes, part 1

Imagine that you’ve studied an empirical problem using linear regression analysis and have settled on a well-specified, actionable model to present to your boss. Or perhaps you’re the boss, using applied regression models to make decisions.

In either case, there’s a good chance a costly mistake is about to occur!

How regression results are presented can lead decision-makers to make bad choices. Emre Soyer and Robin M. Hogarth*, who study behavioral decision-making, found that even experts are frequently tripped up when making decisions based on applied regression models.

In this post, I'll look...

Size Matters: Metabolic Rate and Longevity

John Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree!

I frequently enjoy reading and watching science-related material. This invariably raises questions, involving other "backyards," that I can better understand using statistics. For instance, see my post about the statistical analysis of dolphin sounds.

The latest topic that grabbed my attention was an apparent error in the BBC program Wonders of Life. In the episode “Size Matters,” Professor Brian Cox presents a graph with a linear regression line that...

Curve Fitting with Linear and Nonlinear Regression

We often think of a relationship between two variables as a straight line. That is, if you increase the predictor by 1 unit, the response always increases by X units. However, not all data have a linear relationship, and your model must fit the curves present in the data.

This fitted line plot shows the folly of using a line to fit a curved relationship!

How do you fit a curve to your data? Fortunately, Minitab statistical software includes a variety of curve-fitting methods in both linear regression and nonlinear regression.

To compare these methods, I’ll fit models to the somewhat tricky curve...

Regression Analysis: How to Interpret the Constant (Y Intercept)

The constant term in linear regression analysis seems to be such a simple thing. Also known as the y intercept, it is simply the value at which the fitted line crosses the y-axis.

While the concept is simple, I’ve seen a lot of confusion about interpreting the constant. That’s not surprising because the value of the constant term is almost always meaningless!

Paradoxically, while the value is generally meaningless, it is crucial to include the constant term in most regression models!

In this post, I’ll show you everything you need to know about the constant in linear regression analysis.

I'll use...

How to Interpret Regression Analysis Results: P-values and Coefficients

Regression analysis generates an equation to describe the statistical relationship between one or more predictor variables and the response variable. After you use Minitab Statistical Software to fit a regression model, and verify the fit by checking the residual plots, you’ll want to interpret the results. In this post, I’ll show you how to interpret the p-values and coefficients that appear in the output for linear regression analysis.

How Do I Interpret the P-Values in Linear Regression Analysis?

The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no...

Using Design of Experiments to Minimize Noise Effects

All processes are affected by various sources of variations over time. Products which are designed based on optimal settings, will, in reality, tend to drift away from their ideal settings during the manufacturing process.

Environmental fluctuations and process variability often cause major quality problems. Focusing only on costs and performances is not enough. Sensitivity to deterioration and process imperfections is an important issue. It is often not possible to completely eliminate variations due to uncontrollable factors (such as temperature changes, contamination, humidity, dust etc…).

Fo...

Multiple Regression Analysis: Use Adjusted R-Squared and Predicted R-Squared to Include the Correct Number of Variables

Multiple regression can be a beguiling, temptation-filled analysis. It’s so easy to add more variables as you think of them, or just because the data are handy. Some of the predictors will be significant. Perhaps there is a relationship, or is it just by chance? You can add higher-order polynomials to bend and twist that fitted line as you like, but are you fitting real patterns or just connecting the dots? All the while, the R-squared (R2) value increases, teasing you, and egging you on to add more variables!

Previously, I showed how R-squared can be misleading when you assess the...

Regression Analysis: How Do I Interpret R-squared and Assess the Goodness-of-Fit?

After you have fit a linear model using regression analysis, ANOVA, or design of experiments (DOE), you need to determine how well the model fits the data. To help you out, Minitab statistical software presents a variety of goodness-of-fit statistics. In this post, we’ll explore the R-squared (R2 ) statistic, some of its limitations, and uncover some surprises along the way. For instance, low R-squared values are not always bad and high R-squared values are not always good!

What Is Goodness-of-Fit for a Linear Model?

Definition: Residual = Observed value - Fitted value

Linear regression...

No Matter How Strong, Correlation Still Doesn't Imply Causation

There's been a really interesting conversation about correlation and causation going on in the LinkedIn Statistics and Analytics Consultants group. 

This is a group with a pretty advanced appreciation of statistical nuances and data analysis, and they've been focusing on how the understanding of causation and correlation can be very field-dependent. For instance, evidence supporting causation might be very different if we're looking at data from a clinical trial conducted under controlled conditions as opposed to observational economic data.

Contributors also have been citing some pretty...

What Are the Effects of Multicollinearity and When Can I Ignore Them?

Multicollinearity is problem that you can run into when you’re fitting a regression model, or other linear model. It refers to predictors that are correlated with other predictors in the model. Unfortunately, the effects of multicollinearity can feel murky and intangible, which makes it unclear whether it’s important to fix.

My goal in this blog post is to bring the effects of multicollinearity to life with real data! Along the way, I’ll show you a simple tool that can remove multicollinearity in some cases.


 My goal in this blog post is to bring multicollinearity to life with real data about...

When Should I Use Confidence Intervals, Prediction Intervals, and Tolerance Intervals

In statistics, we use a variety of intervals to characterize the results. The most well-known of these are confidence intervals. However, confidence intervals are not always appropriate. In this post, we’ll take a look at the different types of intervals that are available in Minitab, their characteristics, and when you should use them.

I’ll cover confidence intervals, prediction intervals, and tolerance intervals. Because tolerance intervals are the least-known, I’ll devote extra time to explaining how they work and when you’d want to use them.

What are Confidence Intervals?

A confidence...