Blog posts and articles about the statistical method called Linear Regression and its use in real-world quality projects.

Since the release of Minitab
Express in 2014, we’ve often received questions in technical
support about the differences between Express and Minitab 17.
In this post, I’ll attempt to provide a comparison between these
two Minitab products.
What Is Minitab 17?
Minitab 17 is an all-in-one graphical and statistical analysis
package that includes basic analysis tools such as hypothesis
testing,... Continue Reading

Face it, you love regression analysis as much as I do.
Regression is one of the most satisfying analyses in Minitab:
get some predictors that should have a relationship to a response,
go through a model selection process, interpret fit statistics like
adjusted R2 and predicted R2, and make
predictions. Yes, regression really is quite wonderful.
Except when it’s not. Dark, seedy corners of the data... Continue Reading

You’ve
performed multiple linear regression and have settled on a model
which contains several predictor variables that are statistically
significant. At this point, it’s common to ask, “Which variable is
most important?”
This question is more complicated than it first appears. For one
thing, how you define “most important” often depends on your
subject area and goals. For another, how you collect... Continue Reading

Design of Experiments (DOE) is the perfect tool to efficiently
determine if key inputs are related to key outputs. Behind the
scenes, DOE is simply a regression analysis. What’s not simple,
however, is all of the choices you have to make when planning your
experiment. What X’s should you test? What ranges should you select
for your X’s? How many replicates should you use? Do you need
center... Continue Reading

In my last post, we took the red pill and dove
deep into the unarguably fascinating and uncompromisingly
compelling world of the matrix plot. I've stuffed this post with
information about a topic of marginal interest...the marginal
plot.
Margins are important. Back in my English composition days, I
recall that margins were particularly prized for the inverse linear
relationship they maintained with... Continue Reading

Suppose you’ve collected data on cycle time, revenue, the
dimension of a manufactured part, or some other metric that’s
important to you, and you want to see what other variables may be
related to it. Now what?
When I graduated from college with my first statistics degree,
my diploma was bona fide proof that I'd endured hours and hours of
classroom lectures on various statistical topics, including
l... Continue Reading

For
one reason or another, the response variable in a regression
analysis might not satisfy one or more of
the assumptions of ordinary least squares regression. The
residuals might follow a skewed distribution or the
residuals might curve as the predictions increase. A common
solution when problems arise with the assumptions of ordinary least
squares regression is to transform the response... Continue Reading

In my last post, I looked at
viewership data for the five seasons of HBO’s hit series Game of
Thrones. I
created a time series plot in Minitab that showed how
viewership rose season by season, and how it varied episode by
episode within each season.
My next step is to fit a statistical model to the data, which
I hope will allow me to predict the viewing numbers for future
episodes.
I am going to... Continue Reading

In this post, I’ll address some common questions we’ve received
in technical support about
the difference between fitted and data means, where to find each
option within Minitab, and how Minitab calculates each.
First,
let’s look at some definitions. It’s useful to have an example, so
I’ll be using the Light Output data set from Minitab’s Data Set
Library, which includes a description of the sample... Continue Reading

In the world of linear models, a hierarchical model contains all
lower-order terms that comprise the higher-order terms that also
appear in the model. For example, a model that includes the
interaction term A*B*C is hierarchical if it includes these terms:
A, B, C, A*B, A*C, and B*C.
Fitting the correct regression model can be as
much of an art as it is a science. Consequently, there's not always
a... Continue Reading

How deeply has statistical content from Minitab blog posts (or
other sources) seeped into your brain tissue? Rather than submit a
biopsy specimen from your temporal lobe for analysis, take this
short quiz to find out. Each question may have more than one
correct answer. Good luck!
Which
of the following are famous figure skating pairs, and which are
methods for testing whether your data follow a... Continue Reading

If you perform linear regression analysis, you might need to
compare different regression lines to see if their constants and
slope coefficients are different. Imagine there is an established
relationship between X and Y. Now, suppose you want to determine
whether that relationship has changed. Perhaps there is a new
context, process, or some other qualitative change, and you want to
determine... Continue Reading

With
Speaker John Boehner resigning, Kevin McCarthy quitting before the
vote for him to be Speaker, and a possible government shutdown in
the works, the Freedom Caucus has certainly been in the news
frequently! Depending on your political bent, the Freedom Caucus
has caused quite a disruption for either good or bad.
Who are these politicians? The Freedom Caucus is a group of
approximately 40... Continue Reading

Step
3 in our DOE problem solving methodology is to determine how many
times to replicate the base experiment plan. The discussion in Part 3
ended with the conclusion that our
4 factors could best be studied using all 16 combinations of the
high and low settings for each factor, a full factorial. Each
golfer will perform half of the sixteen possible combinations and
each golfer’s data could stand as... Continue Reading

Step
2 in our DOE problem-solving methodology is to design the data
collection plan you will use to study the factors in your
experiment. Of course, you will have to incorporate blocking and
covariates in your experiment design, as well as calculate the
number of replications of run conditions needed in order to be
confident in your results.
We will address these topics in future posts, but for... Continue Reading

If
you use ordinary linear regression with a response of count data,
if may work out fine (Part
1), or you may run into some problems (Part
2).
Given that a count response could be problematic, why not use a
regression procedure developed to handle a response of counts?
A Poisson regression analysis is designed to analyze a
regression model with a count response.
First, let's try using Poisson... Continue Reading

My previous post showed an example of using
ordinary linear regression to model a count response. For that particular count data, shown by the blue
circles on the dot plot below, the model assumptions for linear
regression were adequately satisfied.
But frequently, count data may contain many values equal or
close to 0. Also, the distribution of the counts may be
right-skewed. In the quality field,... Continue Reading

Ever use dental floss to cut soft cheese? Or Alka Seltzer to
clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects
online. Some are more peculiar than others.
Ever use ordinary linear regression to evaluate a response
(outcome) variable of counts?
Technically, ordinary linear regression was designed to evaluate
a a continuous response variable. A continuous... Continue Reading

In regression
analysis, overfitting a model is a real problem. An overfit model
can cause the regression coefficients, p-values, and R-squared to be misleading. In this post,
I explain what an overfit model is and how to detect and avoid this
problem.
An overfit model is one that is too complicated for your data
set. When this happens, the regression model becomes tailored to
fit the quirks and... Continue Reading

Previously, I’ve written about
how to interpret regression coefficients and their individual P
values.
I’ve also written about
how to interpret R-squared to assess the strength of the
relationship between your model and the response variable.
Recently I've been asked, how does the F-test of the overall
significance and its P value fit in with these other statistics?
That’s the topic of this post!
In... Continue Reading