An exciting new study sheds light on the relationship between P
values and the replication of experimental results. This study
highlights issues that I've emphasized repeatedly—it is crucial to
interpret P values correctly, and significant
results must be replicated to be trustworthy.
The study also supports my disagreement with the decision
by the Journal of Basic and Applied Social Psychology to
b... Continue Reading

Repeated measures designs don’t fit our impression of a typical
experiment in several key ways. When we think of an experiment, we
often think of a design that has a clear distinction between the
treatment and control groups. Each subject is in one, and only one,
of these non-overlapping groups. Subjects who are in a treatment
group are exposed to only one type of treatment. This is the... Continue Reading

Minitab 17 gives you the confidence you need to improve quality.

Download the Free Trial
In regression
analysis, overfitting a model is a real problem. An overfit model
can cause the regression coefficients, p-values, and R-squared to be misleading. In this post,
I explain what an overfit model is and how to detect and avoid this
problem.
An overfit model is one that is too complicated for your data
set. When this happens, the regression model becomes tailored to
fit the quirks and... Continue Reading

Previously, I’ve written about
how to interpret regression coefficients and their individual P
values.
I’ve also written about
how to interpret R-squared to assess the strength of the
relationship between your model and the response variable.
Recently I've been asked, how does the F-test of the overall
significance and its P value fit in with these other statistics?
That’s the topic of this post!
In... Continue Reading

Scientists who use the Hubble Space Telescope to explore the
galaxy receive a stream of digitized images in the form binary
code. In this state, the information is essentially worthless-
these 1s and 0s must first be converted into pictures before the
scientists can learn anything from them.
The same is true of statistical distributions and parameters that are used to describe sample data. They... Continue Reading

In
my previous post, I wrote about the hypothesis testing ban in
the Journal of Basic and Applied Social Psychology. I
showed how P values and confidence intervals provide important
information that descriptive statistics alone don’t provide. In
this post, I'll cover the editors’ concerns about hypothesis
testing and how to avoid the problems they describe.
The editors describe hypothesis testing... Continue Reading

Banned! In February 2015, editor David Trafimow and associate
editor Michael Marks of the Journal of Basic and Applied Social
Psychology declared that the null hypothesis statistical
testing procedure is invalid. They promptly banned P values,
confidence intervals, and hypothesis testing from the journal.
The journal now requires descriptive statistics and effect
sizes. They also encourage large... Continue Reading

The
2016 presidential race is becoming more real. We’ve had several
announcements with Ted Cruz, Rand Paul, Hillary Clinton, and Marco
Rubio officially entering the race to be President. While the
prospective Democratic candidates are down to one, or at most a
few, the Republican field is extra-large this election cycle. The
first order of business for a GOP candidate is to survive the
nomination... Continue Reading

In this series of posts, I show how hypothesis tests and
confidence intervals work by focusing on concepts and graphs rather
than equations and numbers.
Previously, I used graphs to show what statistical significance really
means. In this post, I’ll explain both confidence intervals and
confidence levels, and how they’re closely related to P values and
significance levels.
How to Correctly... Continue Reading

This is a companion post for a series of blog posts about
understanding hypothesis tests. In this series, I create a
graphical equivalent to a 1-sample t-test and confidence interval
to help you understand how it works more intuitively.
This post focuses entirely on the steps required to create the
graphs. It’s a fairly technical and task-oriented post designed for
those who need to create the... Continue Reading

What do significance levels and P values mean in hypothesis
tests? What is statistical significance anyway? In this
post, I’ll continue to focus on concepts and graphs to help you
gain a more intuitive understanding of how hypothesis tests work in
statistics.
To bring it to life, I’ll add the significance level and P value
to the graph in my previous post in order to perform a graphical
version of... Continue Reading

Hypothesis testing is an essential procedure in statistics. A
hypothesis test evaluates two mutually exclusive statements about a
population to determine which statement is best supported by the
sample data. When we say that a finding is statistically
significant, it’s thanks to a hypothesis test. How do these tests
really work and what does statistical significance actually
mean?
In this series of... Continue Reading

It’s safe to say that most people who use statistics are more
familiar with parametric analyses than nonparametric analyses.
Nonparametric tests are also called distribution-free tests because
they don’t assume that your data follow a specific
distribution.
You may have heard that you should use nonparametric tests when
your data don’t meet the assumptions of the parametric test,
especially the... Continue Reading

As someone who has
collected and analyzed real data for a living, the idea of
using simulated data for a Monte Carlo simulation sounds a bit odd.
How can you improve a real product with simulated data? In this
post, I’ll help you understand the methods behind Monte Carlo
simulation and walk you through a simulation example using
Devize.
What is Devize, you ask? Devize is Minitab's
exciting new,... Continue Reading

Choosing
the correct linear regression model can be difficult. After all,
the world and how it works is complex. Trying to model it with only
a sample doesn’t make it any easier. In this post, I'll review some
common statistical methods for selecting models, complications you
may face, and provide some practical advice for choosing the best
regression model.
It starts when a researcher wants to... Continue Reading

Last fall I had a birthday. It wasn’t one of those tougher
birthdays where the number ends in a zero. Still, the birthday got
me thinking. In response, I told myself, age is just a number. Then
I did a mental double-take. Can a statistician say that? After all,
numbers are how I understand the world and the way it works.
Can age just be a number? After some musing, I concluded that
age is just a... Continue Reading

Stepwise regression and best subsets regression are both
automatic tools that help you identify useful predictors during the
exploratory stages of model building for linear regression. These
two procedures use different methods and present you with different
output.
An obvious question arises. Does one procedure pick the true
model more often than the other? I’ll tackle that question in this
post.
Fi... Continue Reading

Analysis
of variance (ANOVA) is great when you want to compare the
differences between group means. For example, you can use ANOVA to
assess how three different alloys are related to the mean strength
of a product. However, most ANOVA tests assess one response
variable at a time, which can be a big problem in certain
situations. Fortunately, Minitab statistical software offers a... Continue Reading

Using a sample to estimate the properties of an entire population
is common practice in statistics. For example, the mean from a
random sample estimates that parameter for an entire population. In linear
regression analysis, we’re used to the idea that the regression coefficients are estimates of the
true parameters. However, it’s easy to forget that R-squared
(R2) is also an estimate.... Continue Reading

I’ve written about the importance of checking your residual plots when performing
linear regression analysis. If you don’t satisfy the assumptions
for an analysis, you might not be able to trust the results. One of
the assumptions for regression analysis is that the residuals are
normally distributed. Typically, you assess this assumption using
the normal probability plot of the residuals.
Are... Continue Reading