Blog posts and articles about regression analysis techniques applied to Lean and Six Sigma quality improvement projects.

When you run a
regression in Minitab, you receive a huge batch of output, and
often it can be hard to know where to start. A lot of times, we get
overwhelmed and just go straight to p-values, ignoring a lot of
valuable information in the process. This post will give you an
introduction to one of the other statistics Minitab displays for
you, the VIF, or Variance Inflation Factor.
To start, let's... Continue Reading

Previously, I’ve written about
how to interpret regression coefficients and their individual P
values.
I’ve also written about
how to interpret R-squared to assess the strength of the
relationship between your model and the response variable.
Recently I've been asked, how does the F-test of the overall
significance and its P value fit in with these other statistics?
That’s the topic of this post!
In... Continue Reading

Minitab 17 gives you the confidence you need to improve quality.

Download the Free Trial
In my previous post, I showed
you that the
coefficients are different when choosing (-1,0,1) vs (1,0) coding
schemes for General Linear Model (or
Regression).
We used the two different
equations to calculate the same fitted values. Here I will focus on
showing what the different coefficients represent.
Let's use the data and models from the last blog post:
We can display the means for
each level... Continue Reading

Since Minitab 17 Statistical
Software launched in February 2014, we've gotten
great feedback from many people have been using the General Linear
Model and Regression tools.
But in speaking with people as part of Minitab's Technical
Support team, I've found many are noticing that there are two
coding schemes available with each. We frequently get calls from
people asking how the coding scheme you... Continue Reading

Imagine that you are watching a race and that you are located
close to the finish line. When the first and fastest runners
complete the race, the differences in times between them will
probably be quite small.
Now wait until the last runners arrive and consider their
finishing times. For these slowest runners, the differences in
completion times will be extremely large. This is due to the fact
that... Continue Reading

The
NCAA Tournament is right around the corner, and you know what that
means: It’s time to start thinking about how you’re going to fill
out your bracket! For the last two years I’ve used the Sagarin
Predictor Ratings to predict the tournament. However, there is
a problem with that strategy this year. The old method uses a
regression model that calculates the probability one team has
of beating... Continue Reading

by Lion "Ari" Ondiappan Arivazhagan, guest
blogger.
An alarming number of borewell accidents, especially involving
little children, have occurred across India in the recent past.
This is the second of a series of articles on Borewell accidents in
India. In the first installment of the series, I used the
G-chart in Minitab Statistical Software to predict the
probabilities of innocent children... Continue Reading

In part 1 of this post, I covered how Six Sigma students at
Rose-Hulman Institute of Technology cleaned up and
prepared project data for a regression analysis. Now we're
ready to start our analysis. We’ll detail the steps in that process
and what we can learn from our results.
What Factors Are Important?
We collected data about 11 factors we believe could be
significant:
Whether the date of... Continue Reading

By Peter Olejnik, guest blogger.
Previous posts on the Minitab Blog have discussed the work of
the Six Sigma students at Rose-Hulman Institute of Technology
to reduce the quantities of recyclables that wind up in the trash.
Led by Dr. Diane Evans, these students continue to make an
important impact on their community.
As
with any Six Sigma process, the results of the work need to be
evaluated. A... Continue Reading

If you wanted to figure out the probability that your favorite
football team will win their next game, how would you do it?
My colleague
Eduardo Santiago and I recently looked at this question, and in
this post we'll share how we approached the solution. Let’s start
by breaking down this problem:
There are only two possible outcomes: your favorite team wins,
or they lose. Ties are a possibility,... Continue Reading

Choosing
the correct linear regression model can be difficult. After all,
the world and how it works is complex. Trying to model it with only
a sample doesn’t make it any easier. In this post, I'll review some
common statistical methods for selecting models, complications you
may face, and provide some practical advice for choosing the best
regression model.
It starts when a researcher wants to... Continue Reading

Stepwise regression and best subsets regression are both
automatic tools that help you identify useful predictors during the
exploratory stages of model building for linear regression. These
two procedures use different methods and present you with different
output.
An obvious question arises. Does one procedure pick the true
model more often than the other? I’ll tackle that question in this
post.
Fi... Continue Reading

Using a sample to estimate the properties of an entire population
is common practice in statistics. For example, the mean from a
random sample estimates that parameter for an entire population. In linear
regression analysis, we’re used to the idea that the regression coefficients are estimates of the
true parameters. However, it’s easy to forget that R-squared
(R2) is also an estimate.... Continue Reading

You need to consider many factors when you’re buying a used car.
Once you narrow your choice down to a particular car model, you can
get a wealth of information about individual cars on the market
through the Internet. How do you navigate through it all to find
the best deal? By analyzing the data you have available.
Let's look at how this works using
the Assistant in Minitab 17. With the... Continue Reading

We like to host webinars, and our customers and prospects
like to attend them. But when our webinar vendor moved from a
pay-per-person pricing model to a pay-per-webinar pricing model, we
wanted to find out how to maximize registrations and thereby
minimize our costs.
We collected webinar data on the following variables:
Webinar topic
Day of week
Time of day – 11 a.m. or 2 p.m.
Newsletter promotion –... Continue Reading

I’ve written about the importance of checking your residual plots when performing
linear regression analysis. If you don’t satisfy the assumptions
for an analysis, you might not be able to trust the results. One of
the assumptions for regression analysis is that the residuals are
normally distributed. Typically, you assess this assumption using
the normal probability plot of the residuals.
Are... Continue Reading

Previously, I showed why there is no R-squared for nonlinear regression. Anyone
who uses nonlinear regression will also notice that there are no P
values for the predictor variables. What’s going on?
Just like there are good reasons not to calculate R-squared for
nonlinear regression, there are also good reasons not to calculate
P values for the coefficients.
Why not—and what to use instead—are the... Continue Reading

In
Blind Wine Part I, we introduced our experimental setup, which
included some survey questions asked ahead of time of each
participant. The four questions asked were:
On a scale of 1 to 10, how would you rate your knowledge of
wine?
How much would you typically spend on a bottle of wine in a
store?
How many different types of wine (merlot, riesling, cabernet,
etc.) would you buy regularly (not as... Continue Reading

Previously,
I’ve written about when to choose nonlinear regression and
how to model curvature with both linear and
nonlinear regression. Since then, I’ve received several
comments expressing confusion about what differentiates nonlinear
equations from linear equations. This confusion is understandable
because both types can model curves.
So, if it’s not the ability to model a curve, what isthe... Continue Reading

In regression analysis, you'd like your regression model to have
significant variables and to produce a high R-squared value. This
low P value / high R2 combination indicates that changes
in the predictors are related to changes in the response variable
and that your model explains a lot of the response variability.
This combination seems to go together naturally. But what if
your regression model... Continue Reading