Blog posts and articles about regression analysis techniques applied to Lean and Six Sigma quality improvement projects.

Overfitting a model is a real problem you need to beware of when
performing regression analysis. An overfit model result in
misleading regression coefficients, p-values,
and R-squared statistics. Nobody wants that,
so let's examine what overfit models are, and how to avoid falling
into the overfitting trap.
Put simply, an overfit model is too complex for the data you're
analyzing. Rather than... Continue Reading

Maybe you're just getting started with analyzing data. Maybe
you're reasonably knowledgeable about statistics, but it's been a
long time since you did a particular analysis and you feel a little
bit rusty. In either case, the Assistant menu in Minitab Statistical Software
gives you an interactive guide from start to finish. It will help
you choose the right tool quickly, analyze your data... Continue Reading

In April 2017, overbooking of flight seats hit the headlines
when a United Airlines customer was dragged off a flight. A TED
talk by Nina Klietsch gives a good, but simplistic explanation of
why overbooking is so attractive to airlines.
Overbooking is not new to the airlines; these strategies were
officially sanctioned by The American Civil Aeronautics Board in
1965, and since that time complex... Continue Reading

Previously,
I’ve written about when to choose nonlinear regression and
how to model curvature with both linear and
nonlinear regression. Since then, I’ve received several
comments expressing confusion about what differentiates nonlinear
equations from linear equations. This confusion is understandable
because both types can model curves.
So, if it’s not the ability to model a curve, what isthe... Continue Reading

One of the biggest pieces of international news last year was
the so-called "Brexit" referendum, in which a majority of voters in
the United Kingdom cast their ballots to leave the European Union
(EU).
That
outcome shocked the world. Follow-up media coverage has asserted
that the younger generation prefers to remain in the EU since that
means more opportunities on the continent. The older... Continue Reading

If
you regularly perform regression analysis, you know that
R2 is a statistic used to evaluate the fit of your
model. You may even know the standard definition of R2:
the percentage of variation in the response that is explained
by the model.
Fair enough. With Minitab Statistical Software doing all the heavy
lifting to calculate your R2 values, that may be all you
ever need to know.
But if you’re... Continue Reading

Did
you ever wonder why statistical analyses and concepts often have
such weird, cryptic names?
One conspiracy theory points to the workings of a secret
committee called the ICSSNN. The International Committee for
Sadistic Statistical Nomenclature and Numerophobia was formed
solely to befuddle and subjugate the masses. Its mission: To select
the most awkward, obscure, and confusing name possible... Continue Reading

Regardless of who you support in the upcoming U.S. election, we
can all agree that it’s been a very bumpy ride! It’s been a
particularly chaotic election cycle. Wouldn’t it be nice if we
could peek into the future and see potential election results right
now? That’s what we'll do in this post!
In 2012, I used binary logistic regression to predict that President Obama would be reelected for
a second... Continue Reading

Data mining can be helpful in the exploratory phase of an
analysis. If you're in the early stages and you're just figuring
out which predictors are potentially correlated with your response
variable, data mining can help you identify candidates. However,
there are problems associated with using data mining to select
variables.
In my previous post, we used data mining to settle on
the following... Continue Reading

Face it, you love regression analysis as much as I do.
Regression is one of the most satisfying analyses in Minitab:
get some predictors that should have a relationship to a response,
go through a model selection process, interpret fit statistics like
adjusted R2 and predicted R2, and make
predictions. Yes, regression really is quite wonderful.
Except when it’s not. Dark, seedy corners of the data... Continue Reading

Data
mining uses algorithms to explore correlations in data sets. An
automated procedure sorts through large numbers of variables and
includes them in the model based on statistical significance alone.
No thought is given to whether the variables and the signs and
magnitudes of their coefficients make theoretical sense.
We tend to think of data mining in the context of big data, with
its huge... Continue Reading

You’ve
performed multiple linear regression and have settled on a model
which contains several predictor variables that are statistically
significant. At this point, it’s common to ask, “Which variable is
most important?”
This question is more complicated than it first appears. For one
thing, how you define “most important” often depends on your
subject area and goals. For another, how you collect... Continue Reading

There may be huge potential benefits waiting in the data in your
servers. These data may be used for many different purposes. Better
data allows better decisions, of course. Banks, insurance firms,
and telecom companies already own a large amount of data about
their customers. These resources are useful for building a more
personal relationship with each customer.
Some organizations already use... Continue Reading

In regression, "sums of squares" are used to represent
variation. In this post, we’ll use some sample data to walk through
these calculations.
The
sample data used in this post is available within Minitab by
choosing Help > Sample Data,
or File > Open Worksheet >
Look in Minitab Sample Data folder (depending on
your version of Minitab). The dataset is called
ResearcherSalary.MTW, and contains data... Continue Reading

You need to consider many factors when you’re buying a used car.
Once you narrow your choice down to a particular car model, you can
get a wealth of information about individual cars on the market
through the Internet. How do you navigate through it all to find
the best deal? By analyzing the data you have available.
Let's look at how this works using
the Assistant in Minitab Statistical... Continue Reading

In my last post, we took the red pill and dove
deep into the unarguably fascinating and uncompromisingly
compelling world of the matrix plot. I've stuffed this post with
information about a topic of marginal interest...the marginal
plot.
Margins are important. Back in my English composition days, I
recall that margins were particularly prized for the inverse linear
relationship they maintained with... Continue Reading

Technology is very much part of
our lives nowadays. We use our smartphones to have video calls with
our friends and family, and watch our favourite TV shows on
tablets. Technology has also transformed the fitness industry with
the increasing popularity of fitness trackers.
Recently, I got myself a fitness watch and it's becoming my
favourite gadget. It can track how many steps I’ve taken, my... Continue Reading

Suppose you’ve collected data on cycle time, revenue, the
dimension of a manufactured part, or some other metric that’s
important to you, and you want to see what other variables may be
related to it. Now what?
When I graduated from college with my first statistics degree,
my diploma was bona fide proof that I'd endured hours and hours of
classroom lectures on various statistical topics, including
l... Continue Reading

For
one reason or another, the response variable in a regression
analysis might not satisfy one or more of
the assumptions of ordinary least squares regression. The
residuals might follow a skewed distribution or the
residuals might curve as the predictions increase. A common
solution when problems arise with the assumptions of ordinary least
squares regression is to transform the response... Continue Reading

I’ve written about R-squared before and I’ve concluded that it’s
not as intuitive as it seems at first glance. It can be a
misleading statistic because a high R-squared is not always good and a low
R-squared is not always bad. I’ve even said that R-squared is overrated and that the standard error of the estimate (S) can be
more useful.
Even though I haven’t always been enthusiastic about... Continue Reading