Blog posts and articles about regression analysis methods applied to Lean and Six Sigma projects.

Using a sample to estimate the properties of an entire population
is common practice in statistics. For example, the mean from a
random sample estimates that parameter for an entire population. In linear
regression analysis, we’re used to the idea that the regression coefficients are estimates of the
true parameters. However, it’s easy to forget that R-squared
(R2) is also an estimate.... Continue Reading

You need to consider many factors when you’re buying a used car.
Once you narrow your choice down to a particular car model, you can
get a wealth of information about individual cars on the market
through the Internet. How do you navigate through it all to find
the best deal? By analyzing the data you have available.
Let's look at how this works using
the Assistant in Minitab 17. With the... Continue Reading

Minitab 17 gives you the confidence you need to improve quality.

Download the Free Trial
I’ve written about the importance of checking your residual plots when performing
linear regression analysis. If you don’t satisfy the assumptions
for an analysis, you might not be able to trust the results. One of
the assumptions for regression analysis is that the residuals are
normally distributed. Typically, you assess this assumption using
the normal probability plot of the residuals.
Are... Continue Reading

In my
previous post, I described how I was asked to weigh in on the
ethics of researchers (DeStefano et al. 2004) who reportedly
discarded data and potentially set scientific knowledge back a
decade. I assessed the study in question and found that no data was
discarded and that the researchers used good statistical
practices.
In
this post, I assess a study by Brian S. Hooker that was... Continue Reading

The other day I received a request from a friend to look into a
new study in a peer reviewed journal that found a link between MMR
vaccinations and an increased risk of autism in African Americans
boys. To draw this conclusion, the new study reanalyzed data that
was discarded a decade ago by a previous study.
My
friend wanted to know, from a statistical perspective, was it
unethical for the... Continue Reading

Previously, I showed why there is no R-squared for nonlinear regression. Anyone
who uses nonlinear regression will also notice that there are no P
values for the predictor variables. What’s going on?
Just like there are good reasons not to calculate R-squared for
nonlinear regression, there are also good reasons not to calculate
P values for the coefficients.
Why not—and what to use instead—are the... Continue Reading

I caught the end of Toy Story over the weekend, which is
definitely one of my all-time favorite children’s movies.
Now—unfortunately or fortunately—I can’t get Randy Newman's theme
song,“You’ve Got a Friend in Me,” out of my head!
It's also got me thinking about the nature of friendship, and
how "best friends forever" are supposed to always be there when you
need them. And, not to get too maudlin... Continue Reading

The
current Ebola outbreak in Guinea, Liberia, and Sierra Leone is
making headlines around the world, and rightfully so: it's a
frightening disease, and last week the World Health Organization
reported its spread is outpacing their response. Nearly 900
of the more than 1,600 people infected during this outbreak
have died, including some leading medical professionals trying to
stanch the... Continue Reading

There’s
a lot going on in the world, so you might not have noticed that the
Organization for Economic Development (OECD) released their new set
of health statistics for member nations. On the OECD website, you
can now download the free data series for 2014. (Be aware that “for
2014” means that the organization has a pretty good idea about what
happened in 2012.)
Of course, there’s nothing more fun... Continue Reading

We received the following question via social media
recently:
I am using Minitab 17 for ANOVA.
I calculated the mean and standard deviation for these 15 values,
but the standard deviation is very high. If I delete some values, I
can reduce the standard deviation. Is there an option in Minitab
that will automatically indicate values that are out of range and
delete them so that the standard... Continue Reading

Previously,
I’ve written about when to choose nonlinear regression and
how to model curvature with both linear and
nonlinear regression. Since then, I’ve received several
comments expressing confusion about what differentiates nonlinear
equations from linear equations. This confusion is understandable
because both types can model curves.
So, if it’s not the ability to model a curve, what isthe... Continue Reading

The
2014 World Cup has gotten off to a high-scoring start. Through the
first week of the tournament, an average of 2.9 goals have been
scored per game, the highest since 1970. And if that average climbs
to over 3 goals per game, this’ll be the highest scoring World Cup
since 1958!
So is this year’s World Cup actually bucking a trend of the low
scoring tournaments that came before it, or can we... Continue Reading

There is more than just the p value in a probability plot—the
overall graphical pattern also provides a great deal of useful
information. Probability plots are a powerful tool to better
understand your data.
In this post, I intend to present the main principles of
probability plots and focus on their visual interpretation using
some real data.
In probability plots, the data density distribution... Continue Reading

In regression analysis, you'd like your regression model to have
significant variables and to produce a high R-squared value. This
low P value / high R2 combination indicates that changes
in the predictors are related to changes in the response variable
and that your model explains a lot of the response variability.
This combination seems to go together naturally. But what if
your regression model... Continue Reading

In Minitab, the Assistant menu is your interactive guide to choosing
the right tool, analyzing data correctly, and interpreting the
results. If you’re feeling a bit rusty with choosing and using a
particular analysis, the Assistant is your friend!
Previously, I’ve written about the new linear model features in Minitab 17. In
this post, I’ll work through a multiple regression analysis example
and... Continue Reading

Last time I posted, I showed you
how to divide a data set into training and validation samples in
Minitab with the promise that next time I would show you a way
to use the validation sample. Regression is a good analysis for
this, because a validation data set can help you to verify that
you’ve selected the best model. I’m going to use a hypothetical
example so that you can see how it works when... Continue Reading

The P
value is used all over statistics, from t-tests to regression analysis. Everyone knows that you
use P values to determine statistical significance in a hypothesis
test. In fact, P values often determine what studies get published
and what projects get funding.
Despite being so important, the P value is a slippery concept
that people often interpret incorrectly. How do you
interpret P values?
In... Continue Reading

In April 2012, I wrote a short paper on
binary logistic regression to analyze wine tasting data. At
that time, François Hollande was about to get elected as French
president and in the U.S., Mitt Romney was winning the Republican
primaries. That seems like a long time ago…
Now, in 2014, Minitab 17 Statistical Softwarehas just been released. Had Minitab 17, been available in 2012,
would have I... Continue Reading

When you're evaluating a dataset, graphical analysis can be very
important. While an analysis like a regression or ANOVA can be
backed up by numbers, being able to visualize how your dataset is
behaving can be even more convincing than a group of
p-values—especially to those who aren’t trained in statistics.
For example, let’s look at a few variables we think may be
correlated. In this specific... Continue Reading