Angst Over ANOVA Assumptions? Ask the Assistant.

Do you suffer from PAAA (Post-Analysis Assumption Angst)? You’re not alone.

Checking the required assumptions for a statistical  analysis is critical. But if you don’t have a Ph.D. in statistics, it can feel more complicated and confusing than the primary analysis itself.

How does the cuckoo egg data, a common sample data set often used to teach analysis of variance, satisfy the following formal assumptions for a classical one-way ANOVA (F-test)?

  • Normality
  • Homoscedasticity
  • Independence

Are My Data (Kinda Sorta) Normal?

To check the normality of each group of data, a common strategy is to display...

A Fun ANOVA: Does Milk Affect the Fluffiness of Pancakes?

by Iván Alfonso, guest blogger

I'm a huge fan of hot cakes—they are my favorite dessert ever. I’ve been cooking them for over 15 years, and over that time I’ve noticed many variation in textures, flavor, and thickness. Personally, I like fluffy pancakes.

There are many brands of hotcake mix on the market, all with very similar formulations. So I decided to investigate which ingredients and inputs may influence the fluffiness of my pancakes.

Potential factors could include the type of mix used, the type of milk used, the use of margarine or butter (of many brands), the amount of mixing time, the...

Cuckoo for Quality: A Birdseye View of a Classic ANOVA Example

If you teach statistics or quality statistics, you’re probably already familiar with the cuckoo egg data set.

The common cuckoo has decided that raising baby chicks is a stressful, thankless job. It has better things to do than fill the screeching, gaping maws of cuckoo chicks, day in and day out.

So the mother cuckoo lays her eggs in the nests of other bird species. If the cuckoo egg is similar enough to the eggs of the host bird, in size and color pattern, the host bird may be tricked into incubating the egg and raising the hatchling. (The cuckoo can then fly off to the French Riviera, or...

“You’ve got a friend” in Minitab Support

I caught the end of Toy Story over the weekend, which is definitely one of my all-time favorite children’s movies. Now—unfortunately or fortunately—I can’t get Randy Newman's theme song,“You’ve Got a Friend in Me,” out of my head!

It's also got me thinking about the nature of friendship, and how "best friends forever" are supposed to always be there when you need them. And, not to get too maudlin about it, but just like Woody and Buzz eventually realize their friendship, all of us hope the professionals who use our software also realize that “you’ve got a friend” in Minitab.

Now what do I mean...

Two-Way ANOVA in Minitab 17

After upgrading to the latest and greatest version of our statistical software, Minitab 17, some users have contacted tech support to ask "Wait a minute, where is that Two-Way ANOVA option in Minitab 17?" 

The answer is that it’s not there. That’s right! The 2-Way ANOVA option that was available in Minitab 16 and prior versions was removed from Minitab 17. Why would this feature be removed from the new version?  Shouldn’t the new version have more features instead of less? 

Two-Way ANOVA was removed from Minitab 17 because you can get the same output by using the General Linear Model option in...

Can I Just Delete Some Values to Reduce the Standard Variation in My ANOVA?

We received the following question via social media recently:

I am using Minitab 17 for ANOVA. I calculated the mean and standard deviation for these 15 values, but the standard deviation is very high. If I delete some values, I can reduce the standard deviation. Is there an option in Minitab that will automatically indicate values that are out of range and delete them so that the standard deviation is low?

In other words, this person wanted a way to automatically eliminate certain values to lower the standard deviation.

Fortunately, Minitab 17 does not have the functionality that this person was...

Using Probability Plots to Understand Laser Games Scores

There is more than just the p value in a probability plot—the overall graphical pattern also provides a great deal of useful information. Probability plots are a powerful tool to better understand your data.

In this post, I intend to present the main principles of probability plots and focus on their visual interpretation using some real data.

In probability plots, the data density distribution is transformed into a linear plot. To do this, the cumulative density function (the so-called CDF, cumulating all probabilities below a given threshold) is used (see the graph below). For a normal...

Did Welch’s ANOVA Make Fisher's Classic One-Way ANOVA Obsolete?

One-way ANOVA can detect differences between the means of three or more groups. It’s such a classic statistical analysis that it’s hard to imagine it changing much.

However, a revolution has been under way for a while now. Fisher's classic one-way ANOVA, which is taught in Stats 101 courses everywhere, may well be obsolete thanks to Welch’s ANOVA.

In this post, I not only want to introduce you to Welch’s ANOVA, but also highlight some interesting research that we perform here at Minitab that guides the implementation of features in our statistical software.

One-Way ANOVA Assumptions

Like any...

The Best European Football League: What the CTQ’s and Minitab Can Tell Us

by Laerte de Araujo Lima, guest blogger

In a previous post (How Data Analysis Can Help Us Predict This Year's Champions League), I shared how I used Minitab Statistical Software to predict the 2013-2014 season of the UEFA Champions league. This involved the regression analysis of main critical-to-quality (CTQ) factors, which I identified using the “voice of the customer” suggestions of some friends.

Since that post was published, my friends have stopped discussing the UEFA Champions league—they were convinced by the results I shared.

But now they’ve challenged me to use Six Sigma tools to...

Introducing the Bubble Plot

When you're evaluating a dataset, graphical analysis can be very important. While an analysis like a regression or ANOVA can be backed up by numbers, being able to visualize how your dataset is behaving can be even more convincing than a group of p-values—especially to those who aren’t trained in statistics.

For example, let’s look at a few variables we think may be correlated. In this specific example, we will take the Unemployment Rate and the Crime Rate for each state in the U.S. We have 3 columns of data in Minitab: C1, which contains the State Name; C2, which contains the Crime Rate; and...

How to Handle Extreme Outliers in Capability Analysis

Transformations and non-normal distributions are typically the first approaches considered when the when the Normality test fails in a capability analysis. These approaches do not work when there are extreme outliers because they both assume the data come from a single common-cause variation distribution. But because extreme outliers typically represent special-cause variation, transformations and non-normal distributions are not good approaches for data that contain extreme outliers.

As an example, the four graphs below show distribution fits for a dataset with 99 values simulated from a...

(We Just Got Rid of) Three Reasons to Fear Data Analysis

Today our company is introducing Minitab 17 Statistical Software, the newest version of the leading software used for quality improvement and statistics education.   So, why should you care? Because important people in your life -- your co-workers, your students, your kids, your boss, maybe even you -- are afraid to analyze data.   There's no shame in that. In fact, there are pretty good reasons for people to feel some trepidation (or even outright panic) at the prospect of making sense of a set of data.

I know how it feels to be intimidated by statistics. Not long ago, I would do almost...

Fix Problems in Regression Analysis with Partial Least Squares

Face it, you love regression analysis as much as I do. Regression is one of the most satisfying analyses in Minitab: get some predictors that should have a relationship to a response, go through a model selection process, interpret fit statistics like adjusted R2 and predicted R2, and make predictions. Yes, regression really is quite wonderful.

Except when it’s not. Dark, seedy corners of the data world exist, lying in wait to make regression confusing or impossible. Good old ordinary least squares regression, to be specific.

For instance, sometimes you have a lot of detail in your data, but not...

Understanding ANOVA by Looking at Your Household Budget

by Arun Kumar, guest blogger

One of the most commonly used statistical methods is ANOVA, short for “Analysis of Variance.” Whether you’re analysing data for Six-Sigma styled quality improvement projects, or perhaps just taking your first statistics course, a good understanding of how this technique works is important.

A lot of concepts are involved in any analysis using ANOVA and its subsequent interpretation. You’re going to have to grapple with terms such as Sources of Variation, Sum of Squares, Mean Squares, Degrees of Freedom, and F-ratio—and you’ll need to understand what statistical...

Making a Difference in How People Use Data

A colleague of mine at Minitab, Cheryl Pammer, was recently featured in "A Statistician's Journey," a monthly feature that appears in the print and online versions of the American Statistical Association's AMSTAT News magazine.  

Each month, the magazine asks ASA members to talk about the paths they took to get to where they are today. Cheryl is a "user experience designer" at Minitab. In other words, she's one of the people who help determine how our statistical softwaredoes what it does, and tries to make it as helpful, useful, and beneficial as possible. Cheryl is always looking for ways to...

Use Analysis of Means to Classify Baseball Parks

When I first got interested in looking at baseball park factors, I only wanted to know which parks benefited hitters and which benefited pitchers. Once I got started, I got interested in the difference between ESPN's published formula and its results and whether there were obvious reasons for the variation in park factors from year-to-year.

But today I’m returning to the original question: which parks are hitters’ parks, and which are pitchers’ parks?

We already know that the mean and median are inadequate by themselves. For example, consider AT&T Park, where the mean suggests a pitchers’...

Using Multi-Vari Charts to Analyze Families of Variations

When trying to solve complex problems, you should first list all the suspected variables identify the few critical factors and separate them from the trivial many, which are not essential to understanding the cause.




Many statistical tools enable you to efficiently identify the effects that are statistically significant in order to converge on the root cause of a problem (for example ANOVA, regression, or even designed experiments (DOEs)). In this post though, I am going to focus on a very simple graphical tool, one that is very intuitive, can be used by virtually anyone, and does not...

Coach Bill Belichick: A Statistical "Hoodie" Analysis, Part 2

by Bob Yoon, guest blogger

Yesterday's post shared how an analysis of Bill Belichick's hoodie-wearing patterns found no statistically significant difference in New England Patriots wins if he wore sleeved or sleeveless hoodies, nor if the hoodie were from Reebok or Nike.

Since these hypothesis tests failed to reject the null hypothesis, I combined these factors under “grey hoodie” and started a new Minitab worksheet.

But when I took a look at all the different outfits Belichick wore, there were still too many variables for a good analysis. I then decided to split this category into two: Type and...

Using Design of Experiments to Minimize Noise Effects

All processes are affected by various sources of variations over time. Products which are designed based on optimal settings, will, in reality, tend to drift away from their ideal settings during the manufacturing process.

Environmental fluctuations and process variability often cause major quality problems. Focusing only on costs and performances is not enough. Sensitivity to deterioration and process imperfections is an important issue. It is often not possible to completely eliminate variations due to uncontrollable factors (such as temperature changes, contamination, humidity, dust etc…).