Overfitting a model is a real problem you need to beware of when
performing regression analysis. An overfit model result in
misleading regression coefficients, p-values,
and R-squared statistics. Nobody wants that,
so let's examine what overfit models are, and how to avoid falling
into the overfitting trap.
Put simply, an overfit model is too complex for the data you're
analyzing. Rather than... Continue Reading
Maybe you're just getting started with analyzing data. Maybe
you're reasonably knowledgeable about statistics, but it's been a
long time since you did a particular analysis and you feel a little
bit rusty. In either case, the Assistant menu in Minitab Statistical Software
gives you an interactive guide from start to finish. It will help
you choose the right tool quickly, analyze your data... Continue Reading
In April 2017, overbooking of flight seats hit the headlines
when a United Airlines customer was dragged off a flight. A TED
talk by Nina Klietsch gives a good, but simplistic explanation of
why overbooking is so attractive to airlines.
Overbooking is not new to the airlines; these strategies were
officially sanctioned by The American Civil Aeronautics Board in
1965, and since that time complex... Continue Reading
One of the biggest pieces of international news last year was
the so-called "Brexit" referendum, in which a majority of voters in
the United Kingdom cast their ballots to leave the European Union
outcome shocked the world. Follow-up media coverage has asserted
that the younger generation prefers to remain in the EU since that
means more opportunities on the continent. The older... Continue Reading
you regularly perform regression analysis, you know that
R2 is a statistic used to evaluate the fit of your
model. You may even know the standard definition of R2:
the percentage of variation in the response that is explained
by the model.
Fair enough. With Minitab Statistical Software doing all the heavy
lifting to calculate your R2 values, that may be all you
ever need to know.
But if you’re... Continue Reading
you ever wonder why statistical analyses and concepts often have
such weird, cryptic names?
One conspiracy theory points to the workings of a secret
committee called the ICSSNN. The International Committee for
Sadistic Statistical Nomenclature and Numerophobia was formed
solely to befuddle and subjugate the masses. Its mission: To select
the most awkward, obscure, and confusing name possible... Continue Reading
Regardless of who you support in the upcoming U.S. election, we
can all agree that it’s been a very bumpy ride! It’s been a
particularly chaotic election cycle. Wouldn’t it be nice if we
could peek into the future and see potential election results right
now? That’s what we'll do in this post!
In 2012, I used binary logistic regression to predict that President Obama would be reelected for
a second...Continue Reading
Data mining can be helpful in the exploratory phase of an
analysis. If you're in the early stages and you're just figuring
out which predictors are potentially correlated with your response
variable, data mining can help you identify candidates. However,
there are problems associated with using data mining to select
In my previous post, we used data mining to settle on
the following... Continue Reading
Face it, you love regression analysis as much as I do.
Regression is one of the most satisfying analyses in Minitab:
get some predictors that should have a relationship to a response,
go through a model selection process, interpret fit statistics like
adjusted R2 and predicted R2, and make
predictions. Yes, regression really is quite wonderful.
Except when it’s not. Dark, seedy corners of the data... Continue Reading
mining uses algorithms to explore correlations in data sets. An
automated procedure sorts through large numbers of variables and
includes them in the model based on statistical significance alone.
No thought is given to whether the variables and the signs and
magnitudes of their coefficients make theoretical sense.
We tend to think of data mining in the context of big data, with
its huge... Continue Reading
performed multiple linear regression and have settled on a model
which contains several predictor variables that are statistically
significant. At this point, it’s common to ask, “Which variable is
This question is more complicated than it first appears. For one
thing, how you define “most important” often depends on your
subject area and goals. For another, how you collect... Continue Reading
There may be huge potential benefits waiting in the data in your
servers. These data may be used for many different purposes. Better
data allows better decisions, of course. Banks, insurance firms,
and telecom companies already own a large amount of data about
their customers. These resources are useful for building a more
personal relationship with each customer.
Some organizations already use... Continue Reading
In regression, "sums of squares" are used to represent
variation. In this post, we’ll use some sample data to walk through
sample data used in this post is available within Minitab by
choosing Help > Sample Data,
or File > Open Worksheet >
Look in Minitab Sample Data folder (depending on
your version of Minitab). The dataset is called
ResearcherSalary.MTW, and contains data... Continue Reading
You need to consider many factors when you’re buying a used car.
Once you narrow your choice down to a particular car model, you can
get a wealth of information about individual cars on the market
through the Internet. How do you navigate through it all to find
the best deal? By analyzing the data you have available.
Let's look at how this works using
the Assistant in Minitab Statistical... Continue Reading
In my last post, we took the red pill and dove
deep into the unarguably fascinating and uncompromisingly
compelling world of the matrix plot. I've stuffed this post with
information about a topic of marginal interest...the marginal
Margins are important. Back in my English composition days, I
recall that margins were particularly prized for the inverse linear
relationship they maintained with... Continue Reading
Technology is very much part of
our lives nowadays. We use our smartphones to have video calls with
our friends and family, and watch our favourite TV shows on
tablets. Technology has also transformed the fitness industry with
the increasing popularity of fitness trackers.
Recently, I got myself a fitness watch and it's becoming my
favourite gadget. It can track how many steps I’ve taken, my... Continue Reading
Suppose you’ve collected data on cycle time, revenue, the
dimension of a manufactured part, or some other metric that’s
important to you, and you want to see what other variables may be
related to it. Now what?
When I graduated from college with my first statistics degree,
my diploma was bona fide proof that I'd endured hours and hours of
classroom lectures on various statistical topics, including
Minitab is the leading provider of software and services for quality
improvement and statistics education. More than 90% of Fortune 100 companies
use Minitab Statistical Software, our flagship product, and more students
worldwide have used Minitab to learn statistics than any other package.
Minitab Inc. is a privately owned company headquartered in State College,
Pennsylvania, with subsidiaries in the United Kingdom, France, and
Australia. Our global network of representatives serves more than 40
countries around the world.