Blog posts and articles about regression analysis techniques applied to Lean and Six Sigma quality improvement projects.

Using a sample to estimate the properties of an entire population
is common practice in statistics. For example, the mean from a
random sample estimates that parameter for an entire population. In linear
regression analysis, we’re used to the idea that the regression coefficients are estimates of the
true parameters. However, it’s easy to forget that R-squared
(R2) is also an estimate.... Continue Reading

You need to consider many factors when you’re buying a used car.
Once you narrow your choice down to a particular car model, you can
get a wealth of information about individual cars on the market
through the Internet. How do you navigate through it all to find
the best deal? By analyzing the data you have available.
Let's look at how this works using
the Assistant in Minitab 17. With the... Continue Reading

Minitab 17 gives you the confidence you need to improve quality.

Download the Free Trial
We like to host webinars, and our customers and prospects
like to attend them. But when our webinar vendor moved from a
pay-per-person pricing model to a pay-per-webinar pricing model, we
wanted to find out how to maximize registrations and thereby
minimize our costs.
We collected webinar data on the following variables:
Webinar topic
Day of week
Time of day – 11 a.m. or 2 p.m.
Newsletter promotion –... Continue Reading

I’ve written about the importance of checking your residual plots when performing
linear regression analysis. If you don’t satisfy the assumptions
for an analysis, you might not be able to trust the results. One of
the assumptions for regression analysis is that the residuals are
normally distributed. Typically, you assess this assumption using
the normal probability plot of the residuals.
Are... Continue Reading

Previously, I showed why there is no R-squared for nonlinear regression. Anyone
who uses nonlinear regression will also notice that there are no P
values for the predictor variables. What’s going on?
Just like there are good reasons not to calculate R-squared for
nonlinear regression, there are also good reasons not to calculate
P values for the coefficients.
Why not—and what to use instead—are the... Continue Reading

In
Blind Wine Part I, we introduced our experimental setup, which
included some survey questions asked ahead of time of each
participant. The four questions asked were:
On a scale of 1 to 10, how would you rate your knowledge of
wine?
How much would you typically spend on a bottle of wine in a
store?
How many different types of wine (merlot, riesling, cabernet,
etc.) would you buy regularly (not as... Continue Reading

Previously,
I’ve written about when to choose nonlinear regression and
how to model curvature with both linear and
nonlinear regression. Since then, I’ve received several
comments expressing confusion about what differentiates nonlinear
equations from linear equations. This confusion is understandable
because both types can model curves.
So, if it’s not the ability to model a curve, what isthe... Continue Reading

In regression analysis, you'd like your regression model to have
significant variables and to produce a high R-squared value. This
low P value / high R2 combination indicates that changes
in the predictors are related to changes in the response variable
and that your model explains a lot of the response variability.
This combination seems to go together naturally. But what if
your regression model... Continue Reading

In Minitab, the Assistant menu is your interactive guide to choosing
the right tool, analyzing data correctly, and interpreting the
results. If you’re feeling a bit rusty with choosing and using a
particular analysis, the Assistant is your friend!
Previously, I’ve written about the new linear model features in Minitab 17. In
this post, I’ll work through a multiple regression analysis example
and... Continue Reading

If
betting wasn't allowed on horse racing, the Kentucky Derby would
likely be a little-known event of interest only to a small group of
horse racing enthusiasts. But like the Tour de France, the World
Cup, and the Masters Tournament, even those with little or no
knowledge of the sport in general seem drawn to the excitement over
its premier event—the mint juleps, the hats...and of course,... Continue Reading

In April 2012, I wrote a short paper on
binary logistic regression to analyze wine tasting data. At
that time, François Hollande was about to get elected as French
president and in the U.S., Mitt Romney was winning the Republican
primaries. That seems like a long time ago…
Now, in 2014, Minitab 17 Statistical Softwarehas just been released. Had Minitab 17, been available in 2012,
would have I... Continue Reading

Nonlinear regression is a very powerful
analysis that can fit virtually any curve. However, it's not
possible to calculate a valid R-squared for nonlinear regression.
This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for
nonlinear regression, some other packages do.
So, what’s going on?
Minitab doesn't calculate R-squared for nonlinear models... Continue Reading

We released Minitab 17 Statistical Software a couple of days ago.
Certainly every new release of Minitab is a reason to celebrate.
However, I am particularly excited about Minitab 17 from a data
analyst’s perspective.
If you read my blogs regularly, you’ll know that I’ve
extensively used and written about linear models. Minitab 17 has a
ton of new features that expand and enhance many types of... Continue Reading

If
you regularly perform regression analysis, you know that
R2 is a statistic used to evaluate the fit of your
model. You may even know the standard definition of R2:
the percentage of variation in the response that is explained
by the model.
Fair enough. With Minitab Statistical Software doing all the heavy
lifting to calculate your R2 values, that may be all you
ever need to know.
But if you’re... Continue Reading

R-squared gets
all of the attention when it comes to determining how well a linear
model fits the data. However, I've stated previously that R-squared is overrated. Is there a different
goodness-of-fit statistic that can be more helpful? You bet!
Today, I’ll highlight a sorely underappreciated regression
statistic: S, or the standard error of the regression. S provides
important information that... Continue Reading

“Turnovers are like ex-wives. The more you have, the more they
cost you.” – Dave Widell, former Dallas Cowboys
lineman
It doesn’t take witty insight from a former NFL player to
realize how big an impact turnovers can have in a football game.
Every time an announcer talks about “Keys to the Game,” winning the
turnover battle is one of them. And as Cowboys fans know all too
well, an ill-timed... Continue Reading

Just
how high should R2 be in regression analysis? I hear
this question asked quite frequently.
Previously, I showed how to interpret R-squared (R2). I
also showed how it can be a misleading statistic because a low
R-squared isn’t necessarily bad and a high R-squared isn’t
necessarily good.
Clearly, the answer for “how high should R-squared be” is . . .
it depends.
In this post, I’ll help you answer... Continue Reading

I’ve
written a number of blog posts about regression analysis and I
think it’s helpful to collect them in this post to create a
regression tutorial. I’ll supplement my own posts with some from my
colleagues.
This tutorial covers many aspects of regression analysis
including: choosing the type of regression analysis to use,
specifying the model, interpreting the results, determining how
well the... Continue Reading

Face it, you love regression analysis as much as I do.
Regression is one of the most satisfying analyses in Minitab:
get some predictors that should have a relationship to a response,
go through a model selection process, interpret fit statistics like
adjusted R2 and predicted R2, and make
predictions. Yes, regression really is quite wonderful.
Except when it’s not. Dark, seedy corners of the data... Continue Reading

For
one reason or another, the response variable in a regression
analysis might not satisfy one or more of
the assumptions of ordinary least squares regression. The
residuals might follow a skewed distribution or the
residuals might curve as the predictions increase. A common
solution when problems arise with the assumptions of ordinary least
squares regression is to transform the response... Continue Reading