Choosing
the correct linear regression model can be difficult. After all,
the world and how it works is complex. Trying to model it with only
a sample doesn’t make it any easier. In this post, I'll review some
common statistical methods for selecting models, complications you
may face, and provide some practical advice for choosing the best
regression model.
It starts when a researcher wants to... Continue Reading

Last fall I had a birthday. It wasn’t one of those tougher
birthdays where the number ends in a zero. Still, the birthday got
me thinking. In response, I told myself, age is just a number. Then
I did a mental double-take. Can a statistician say that? After all,
numbers are how I understand the world and the way it works.
Can age just be a number? After some musing, I concluded that
age is just a... Continue Reading

Minitab 17 gives you the confidence you need to improve quality.

Download the Free Trial
Stepwise regression and best subsets regression are both
automatic tools that help you identify useful predictors during the
exploratory stages of model building for linear regression. These
two procedures use different methods and present you with different
output.
An obvious question arises. Does one procedure pick the true
model more often than the other? I’ll tackle that question in this
post.
Fi... Continue Reading

Analysis
of variance (ANOVA) is great when you want to compare the
differences between group means. For example, you can use ANOVA to
assess how three different alloys are related to the mean strength
of a product. However, most ANOVA tests assess one response
variable at a time, which can be a big problem in certain
situations. Fortunately, Minitab statistical software offers a... Continue Reading

Using a sample to estimate the properties of an entire population
is common practice in statistics. For example, the mean from a
random sample estimates that parameter for an entire population. In linear
regression analysis, we’re used to the idea that the regression coefficients are estimates of the
true parameters. However, it’s easy to forget that R-squared
(R2) is also an estimate.... Continue Reading

I’ve written about the importance of checking your residual plots when performing
linear regression analysis. If you don’t satisfy the assumptions
for an analysis, you might not be able to trust the results. One of
the assumptions for regression analysis is that the residuals are
normally distributed. Typically, you assess this assumption using
the normal probability plot of the residuals.
Are... Continue Reading

Astronomy is cool! And, it’s gotten even more exciting with the
search for exoplanets. You’ve probably heard about newly discovered
exoplanets that are extremely different from Earth. These include
hot Jupiters, super-cold iceballs, super-heated hellholes,
very-low-density puffballs, and ultra-speedy planets that orbit
their star in just hours. And then there is PSR J1719-1438 which
has the mass... Continue Reading

In my
previous post, I described how I was asked to weigh in on the
ethics of researchers (DeStefano et al. 2004) who reportedly
discarded data and potentially set scientific knowledge back a
decade. I assessed the study in question and found that no data was
discarded and that the researchers used good statistical
practices.
In
this post, I assess a study by Brian S. Hooker that was... Continue Reading

The other day I received a request from a friend to look into a
new study in a peer reviewed journal that found a link between MMR
vaccinations and an increased risk of autism in African Americans
boys. To draw this conclusion, the new study reanalyzed data that
was discarded a decade ago by a previous study.
My
friend wanted to know, from a statistical perspective, was it
unethical for the... Continue Reading

Previously, I showed why there is no R-squared for nonlinear regression. Anyone
who uses nonlinear regression will also notice that there are no P
values for the predictor variables. What’s going on?
Just like there are good reasons not to calculate R-squared for
nonlinear regression, there are also good reasons not to calculate
P values for the coefficients.
Why not—and what to use instead—are the... Continue Reading

Previously,
I’ve written about when to choose nonlinear regression and
how to model curvature with both linear and
nonlinear regression. Since then, I’ve received several
comments expressing confusion about what differentiates nonlinear
equations from linear equations. This confusion is understandable
because both types can model curves.
So, if it’s not the ability to model a curve, what isthe... Continue Reading

In regression analysis, you'd like your regression model to have
significant variables and to produce a high R-squared value. This
low P value / high R2 combination indicates that changes
in the predictors are related to changes in the response variable
and that your model explains a lot of the response variability.
This combination seems to go together naturally. But what if
your regression model... Continue Reading

In Minitab, the Assistant menu is your interactive guide to choosing
the right tool, analyzing data correctly, and interpreting the
results. If you’re feeling a bit rusty with choosing and using a
particular analysis, the Assistant is your friend!
Previously, I’ve written about the new linear model features in Minitab 17. In
this post, I’ll work through a multiple regression analysis example
and... Continue Reading

There is high pressure to find low P values. Obtaining a low P
value for a hypothesis test is make or break because it can lead to
funding, articles, and prestige. Statistical significance is
everything!
My two previous posts looked at several issues related to P
values:
P values have a higher than expected false positive
rate.
The same P value from different studies can
correspond to different false... Continue Reading

The
interpretation of P values would seem to be fairly standard between
different studies. Even if two hypothesis tests study different
subject matter, we tend to assume that you can interpret a P value
of 0.03 the same way for both tests. A P value is a P value,
right?
Not so fast! While Minitab statistical software can correctly calculate all P
values, it can’t factor in the larger context of the... Continue Reading

The P
value is used all over statistics, from t-tests to regression analysis. Everyone knows that you
use P values to determine statistical significance in a hypothesis test. In fact, P values often
determine what studies get published and what projects get
funding.
Despite being so important, the P value is a slippery concept
that people often interpret incorrectly. How do you
interpret P values?
In... Continue Reading

One-way
ANOVA can detect differences between the means of three or more
groups. It’s such a classic statistical analysis that it’s hard to
imagine it changing much.
However, a revolution has been under way for a while now.
Fisher's classic one-way ANOVA, which is taught in Stats 101
courses everywhere, may well be obsolete thanks to Welch’s
ANOVA.
In this post, I not only want to introduce you to... Continue Reading

Nonlinear regression is a very powerful
analysis that can fit virtually any curve. However, it's not
possible to calculate a valid R-squared for nonlinear regression.
This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for
nonlinear regression, some other packages do.
So, what’s going on?
Minitab doesn't calculate R-squared for nonlinear models... Continue Reading

We released Minitab 17 Statistical Software a couple of days ago.
Certainly every new release of Minitab is a reason to celebrate.
However, I am particularly excited about Minitab 17 from a data
analyst’s perspective.
If you read my blogs regularly, you’ll know that I’ve
extensively used and written about linear models. Minitab 17 has a
ton of new features that expand and enhance many types of... Continue Reading

Atlanta
was a mess on January 28th, 2014. Thousands were
trapped on the roads overnight while others managed to get to
roadside stores to camp out. Thousands of students were forced to
spend the night in their schools and the National Guard was called
in to get them home. Many wondered how less than three inches of
snow could cripple the city, particularly when Atlanta had
experienced a similar... Continue Reading