Blog posts and articles about regression analysis techniques applied to Lean and Six Sigma quality improvement projects.

If
you regularly perform regression analysis, you know that
R2 is a statistic used to evaluate the fit of your
model. You may even know the standard definition of R2:
the percentage of variation in the response that is explained
by the model.
Fair enough. With Minitab Statistical Software doing all the heavy
lifting to calculate your R2 values, that may be all you
ever need to know.
But if you’re... Continue Reading

Did
you ever wonder why statistical analyses and concepts often have
such weird, cryptic names?
One conspiracy theory points to the workings of a secret
committee called the ICSSNN. The International Committee for
Sadistic Statistical Nomenclature and Numerophobia was formed
solely to befuddle and subjugate the masses. Its mission: To select
the most awkward, obscure, and confusing name possible... Continue Reading

Regardless of who you support in the upcoming U.S. election, we
can all agree that it’s been a very bumpy ride! It’s been a
particularly chaotic election cycle. Wouldn’t it be nice if we
could peek into the future and see potential election results right
now? That’s what we'll do in this post!
In 2012, I used binary logistic regression to predict that President Obama would be reelected for
a second... Continue Reading

Data mining can be helpful in the exploratory phase of an
analysis. If you're in the early stages and you're just figuring
out which predictors are potentially correlated with your response
variable, data mining can help you identify candidates. However,
there are problems associated with using data mining to select
variables.
In my previous post, we used data mining to settle on
the following... Continue Reading

Face it, you love regression analysis as much as I do.
Regression is one of the most satisfying analyses in Minitab:
get some predictors that should have a relationship to a response,
go through a model selection process, interpret fit statistics like
adjusted R2 and predicted R2, and make
predictions. Yes, regression really is quite wonderful.
Except when it’s not. Dark, seedy corners of the data... Continue Reading

Data
mining uses algorithms to explore correlations in data sets. An
automated procedure sorts through large numbers of variables and
includes them in the model based on statistical significance alone.
No thought is given to whether the variables and the signs and
magnitudes of their coefficients make theoretical sense.
We tend to think of data mining in the context of big data, with
its huge... Continue Reading

You’ve
performed multiple linear regression and have settled on a model
which contains several predictor variables that are statistically
significant. At this point, it’s common to ask, “Which variable is
most important?”
This question is more complicated than it first appears. For one
thing, how you define “most important” often depends on your
subject area and goals. For another, how you collect... Continue Reading

There may be huge potential benefits waiting in the data in your
servers. These data may be used for many different purposes. Better
data allows better decisions, of course. Banks, insurance firms,
and telecom companies already own a large amount of data about
their customers. These resources are useful for building a more
personal relationship with each customer.
Some organizations already use... Continue Reading

In regression, "sums of squares" are used to represent
variation. In this post, we’ll use some sample data to walk through
these calculations.
The
sample data used in this post is available within Minitab by
choosing Help > Sample Data,
or File > Open Worksheet >
Look in Minitab Sample Data folder (depending on
your version of Minitab). The dataset is called
ResearcherSalary.MTW, and contains data... Continue Reading

You need to consider many factors when you’re buying a used car.
Once you narrow your choice down to a particular car model, you can
get a wealth of information about individual cars on the market
through the Internet. How do you navigate through it all to find
the best deal? By analyzing the data you have available.
Let's look at how this works using
the Assistant in Minitab Statistical... Continue Reading

In my last post, we took the red pill and dove
deep into the unarguably fascinating and uncompromisingly
compelling world of the matrix plot. I've stuffed this post with
information about a topic of marginal interest...the marginal
plot.
Margins are important. Back in my English composition days, I
recall that margins were particularly prized for the inverse linear
relationship they maintained with... Continue Reading

Technology is very much part of
our lives nowadays. We use our smartphones to have video calls with
our friends and family, and watch our favourite TV shows on
tablets. Technology has also transformed the fitness industry with
the increasing popularity of fitness trackers.
Recently, I got myself a fitness watch and it's becoming my
favourite gadget. It can track how many steps I’ve taken, my... Continue Reading

Suppose you’ve collected data on cycle time, revenue, the
dimension of a manufactured part, or some other metric that’s
important to you, and you want to see what other variables may be
related to it. Now what?
When I graduated from college with my first statistics degree,
my diploma was bona fide proof that I'd endured hours and hours of
classroom lectures on various statistical topics, including
l... Continue Reading

For
one reason or another, the response variable in a regression
analysis might not satisfy one or more of
the assumptions of ordinary least squares regression. The
residuals might follow a skewed distribution or the
residuals might curve as the predictions increase. A common
solution when problems arise with the assumptions of ordinary least
squares regression is to transform the response... Continue Reading

I’ve written about R-squared before and I’ve concluded that it’s
not as intuitive as it seems at first glance. It can be a
misleading statistic because a high R-squared is not always good and a low
R-squared is not always bad. I’ve even said that R-squared is overrated and that the standard error of the estimate (S) can be
more useful.
Even though I haven’t always been enthusiastic about... Continue Reading

When running a binary logistic regression and many other
analyses in Minitab, we estimate parameters for a specified model
based on the sample data that has been collected. Most of the time,
we use what is called Maximum Likelihood Estimation. However, based
on specifics within your data, sometimes these estimation methods
fail. What happens then?
Specifically, during binary logistic regression, an... Continue Reading

What is an interaction? It’s when the effect of one factor
depends on the level of another factor. Interactions are important
when you’re performing ANOVA, DOE, or a regression analysis.
Without them, your model may be missing an important term that
helps explain variability in the response!
For example, let’s consider 3-point shooting in the NBA. We
previously saw that the number of 3-point... Continue Reading

In statistics, there are things you need to do so you can trust
your results. For example, you should check the sample size, the
assumptions of the analysis, and so on. In regression analysis, I
always urge people to check their residual plots.
In this blog post, I present one more thing you should do so you
can trust your regression results in certain
circumstances—standardize the continuous... Continue Reading

In the world of linear models, a hierarchical model contains all
lower-order terms that comprise the higher-order terms that also
appear in the model. For example, a model that includes the
interaction term A*B*C is hierarchical if it includes these terms:
A, B, C, A*B, A*C, and B*C.
Fitting the correct regression model can be as
much of an art as it is a science. Consequently, there's not always
a... Continue Reading

If you perform linear regression analysis, you might need to
compare different regression lines to see if their constants and
slope coefficients are different. Imagine there is an established
relationship between X and Y. Now, suppose you want to determine
whether that relationship has changed. Perhaps there is a new
context, process, or some other qualitative change, and you want to
determine... Continue Reading