Why Kurtosis is Like Liposuction. And Why it Matters.

The word kurtosis sounds like a painful, festering disease of the gums. But the term actually describes the shape of a data distribution.

Frequently, you'll see kurtosis defined as how sharply "peaked" the data are. The three main types of kurtosis are shown below.

Lepto means "thin" or "slender" in Greek. In leptokurtosis, the kurtosis value is high.

Platy means "broad" or "flat"—as in duck-billed platypus. In platykurtosis, the kurtosis value is low.


Meso means "middle" or "between." The normal distribution is mesokurtic.

Mesokurtosis can be defined with a value of 0 (called its "excess"...

Angst Over ANOVA Assumptions? Ask the Assistant.

Do you suffer from PAAA (Post-Analysis Assumption Angst)? You’re not alone.

Checking the required assumptions for a statistical  analysis is critical. But if you don’t have a Ph.D. in statistics, it can feel more complicated and confusing than the primary analysis itself.

How does the cuckoo egg data, a common sample data set often used to teach analysis of variance, satisfy the following formal assumptions for a classical one-way ANOVA (F-test)?

  • Normality
  • Homoscedasticity
  • Independence

Are My Data (Kinda Sorta) Normal?

To check the normality of each group of data, a common strategy is to display...

Cuckoo for Quality: A Birdseye View of a Classic ANOVA Example

If you teach statistics or quality statistics, you’re probably already familiar with the cuckoo egg data set.

The common cuckoo has decided that raising baby chicks is a stressful, thankless job. It has better things to do than fill the screeching, gaping maws of cuckoo chicks, day in and day out.

So the mother cuckoo lays her eggs in the nests of other bird species. If the cuckoo egg is similar enough to the eggs of the host bird, in size and color pattern, the host bird may be tricked into incubating the egg and raising the hatchling. (The cuckoo can then fly off to the French Riviera, or...

Why Is this Yorkie So Irritated? Oversimplified Statistical Models

You know what really gets on my nerves? A lot of things.

That slow, slinky way that cats walk by. Grrrr.

The rude, abrupt arrival of delivery persons in their obnoxiously loud trucks. (Why do they always pull up just as I’m settling down for a nap?) Grrrr.

Total strangers who reach down and poke me with fat, clumsy fingers that reek of antibacterial soap. Grrrr.

And this one always gets my dander up: Me and the human are out on a walk when some passerby  stops and points at me.

“What a cutie. How old is she?”

"What insolence!" I'll yap back. "I’m a he! And how old are YOU!!?"

Then I’m told to shut up.


Exponential: How a Poor Memory Helps to Model Failure Data

These days, my memory isn't what it used to be. Besides that, my memory isn't what it used to be. 

But my incurable case of CRS (Can't Remember Stuff) is not nearly as bad as that of the exponential distribution.

When modelling failure data for reliability analysis, the exponential distribution is completely memoryless. It retains no record of the previous failure of an item.

That might sound like a bad thing. But this special characteristic makes the distribution extremely useful for modelling the behavior of items that have a constant failure rate.

Using the Exponential Distribution to Model...

What Can Classical Chinese Poetry Teach Us About Graphical Analysis?

A famous classical Chinese poem from the Song dynasty describes the views of a mist-covered mountain called Lushan.

The poem was inscribed on the wall of a Buddhist monastery by Su Shi, a renowned poet, artist, and calligrapher of the 11th century.

Deceptively simple, the poem captures the illusory nature of human perception.

   Written on the Wall of West Forest Temple

                                      --Su Shi
  From the side, it's a mountain ridge.
  Looking up, it's a single peak.
  Far or near, high or low, it never looks the same.
  You can't know the true face of Lu Mountain

Equivalence Testing for Quality Analysis (Part II): What Difference Does the Difference Make?

My previous post examined how an equivalence test can shift the burden of proof when you perform hypothesis test of the means. This allows you to more rigorously test whether the process mean is equivalent to a target or to another mean.

Here’s another key difference: To perform the analysis, an equivalence test requires that you first define, upfront, the size of a practically important difference between the mean and the target, or between two means.

Truth be told, even when performing a standard hypothesis test, you should know the value of this difference. Because you can’t really evaluate...

Equivalence Testing for Quality Analysis (Part I): What are You Trying to Prove?

With more options, come more decisions.

With equivalence testing added to Minitab 17, you now have more statistical tools to test a sample mean against target value or another sample mean.

Equivalence testing is extensively used in the biomedical field. Pharmaceutical manufacturers often need to test whether the biological activity of a generic drug is equivalent to that of a brand name drug that has already been through the regulatory approval process.

But in the field of quality improvement, why might you want to use an equivalence test instead of a standard t-test?

Interpreting Hypothesis...

Who's More (or Less) Irish?

B'gosh n' begorrah, it's St. Patrick's Day today!

The day that we Americans lay claim to our Irish heritage by doing all sorts of things that Irish people never do. Like dye your hair green. Or tell everyone what percentage Irish you are.

Despite my given name, I'm only about 15% Irish. So my Irish portion weighs about 25 pounds. It could be the portion that hangs over my belt due to excess potatoes and beer.

Today, many American cities compete for the honor of being "the most Irish." Who deserves to take top honors? Data from the U.S. Census Bureau can help us decide.

The Minitab bar chart below...

Opening Ceremonies for Bubble Plots and Poisson Regression

By popular demand, Release 17 of Minitab Statistical Software comes with a new graphical analysis called the Bubble Plot.

This exploratory tool is great for visualizing the relationships among three variables on a single plot.

To see how it works, consider the total medal count by country from the recently completed 2014 Olympic Winter Games. Suppose I want to explore whether there might be a possible association between the number of medals a country won and its maximum elevation. For that, I could use a simple scatterplot, right?

But say I want to throw a third variable into the mix, such as...

R-Squared: Sometimes, a Square is just a Square

If you regularly perform regression analysis, you know that R2 is a statistic used to evaluate the fit of your model. You may even know the standard definition of R2: the percentage of variation in the response that is explained by the model.

Fair enough. With Minitab Statistical Software doing all the heavy lifting to calculate your R2 values, that may be all you ever need to know.

But if you’re like me, you like to crack things open to see what’s inside. Understanding the essential nature of a statistic helps you demystify it and interpret it more accurately.

R-squared: Where Geometry Meets...

Quantum Estimates: Where Angels Fear to Tread

This close to the holidays, it’s hard to stay focused on work.

I should be writing a post about useful estimation tools for quality statistics. But all those yuletide carols about hosts of angels singing from on high have distracted me.

Alas, I’ve fallen into the clutches of one of the world’s oldest estimation problems, posed centuries ago by medieval scholars:

Just how many heavenly angels can dance simultaneously on the point of a pin?

The answer to this question assumes that you believe in the existence of pins, of course.

Estimation in the Middle Ages: Ask Your Doctor

Over the centuries, a...

Doggy DOE Part III: Analyze This!

What factors significantly affect how quickly my couch-potato pooch obeys the “Lay Down” command?

The cushiness of the floor surface? The tone of voice used? The type of reward she gets? How hungry she is?

I created a 1/8 fraction Resolution IV design for 7 factors and collected response data for 16 runs. Now it’s time to analyze the data in Minitab, using  Stat > DOE > Factorial > Analyze Factorial Design.

After removing insignificant terms from the model, one at a time, starting with the highest-order interaction, here's the final model:

Of the original 7 factors in the screening experiment,...

Doggy DOE Part II: Create Your Design

Nala, our 6-year-old golden retriever, loves her dogma. That's her sitting in front of church on Sunday morning.

But she's not crazy about her catechism. For example, she doesn't always dutifully follow the "Lay Down" commandment.  

What factors may be influencing her response? We're performing a DOE screening experiment to find out.

In this post, we'll use Minitab Statistical Software to

  • Create the design for the experiment
  • Determine the confounding pattern for this design
  • Set up the data collection worksheet

Create the Design for the Experiment

In the previous post, we used the Display Design dialog...

Doggy DOE Part I: Design on a Dime

Design of experiments (DOE) is an extremely practical and cost-effective way to study the effects of different factors and their interactions on a response.

But finding your way through DOE-land can be daunting when you're just getting started. So I've enlisted the support of a friendly golden retriever as a guide dog to walk us through a simple DOE screening experiment.

Nala, the golden retriever, is shown at right. Notice how patiently she sits as her picture is being taken. She's a  true virtuoso with the "Sit" command.

But "Lay Down" is another story...

Formulate the Objective

Although Nala know...

Optical Illusions, Zen Koans, and Simpson’s Paradox

What do you see when you look at the image at right?

Do you see a bulging sphere that stretches the checkerboard pattern in the center, causing its lines to curve?

Are you sure? Look again. This time, test any “curved” line by holding a straightedge next to it.

The image is actually composed of small squares and straight lines. Yet, when perceived as a composite whole, it creates a completely different  impression.

A similar “illusion” can occur when you analyze your data.  It’s called the Yule-Simpson effect—or Simpson’s paradox for short.

When you look at the overall results of all your data, you...

Debt Made Beautiful

Does the prospect of a looming U.S. government shutdown depress you?

Are you tired of the ongoing game of Chicken played over our federal budget? The dissonant hysterics of deficit drama queens? The glib arrogance of deficit deniers?

Then it might be a good time to take a break and focus on something more pleasant. Something you can control. Something you can improve and make more beautiful.

Like editing your graphs in Minitab.

In this post, I'll show you how to gussy up a graph with a few simple maneuvers.

(If you want to follow along and you don’t have Minitab 16, download a free trial copy.)


Warning: Failing to Display a Pareto Chart May be Hazardous to Your Health

Defects can cause a lot of pain to your customer.

They can also cause a lot of pain inside your body. The picture at right shows my broken right clavicle. Ouch!

You might think of it as the defective output from my bicycling process, which needs improvement.

Sitting around all summer cinched up in a foam orthopedic brace hasn’t exactly been wild and wacky 50s-style fun at the beach.

But the injury has had its perks (a box of mouth-watering dark chocolate ganaches from kind Minitab coworkers, for example!)

It’s also provided me with a rare commodity in the year 2013: Plenty of time to think.


Hotshot Stats: Evaluating the Service Life of Smokejumper Parachutes

It’s wildfire season out West. Time to be in awe of the destructive power of Nature.

According to active fire maps by the USDA Forest Service, over 300 fires are now burning across a total of 1.5 million acres—including 35 large, uncontained blazes.

Shifting winds, humidity, and terrain can quickly alter a fire's intensity. In extreme conditions, flames can reach over 150 feet, with temperatures exceeding 2000° F.

This ferocious power is matched by only one thing: The incredible strength, courage, and skills of smokejumpers who parachute into remote areas to combat the deadly blazes.

But danger...

Graph Quest: How to Show that Life on Venus Is Safer than Life on Mars

True confession: Nothing fires quickly from the top of my head. At least nothing very lucid or useful.

To come up with a good idea, I have to dredge thoughts slowly from the thick sludge and sediment in my brain. 

It's not always easy—there are deeply encrusted layers in my cerebral cortex that go all the way back to the Paleozoic era.

So coming up with a useful data display—one that uncovers hidden patterns or elucidates interesting relationships—often takes a bit of doing for me.

It's rare that I can nail it on the first shot.

Finding the Graph That's Worth a Thousand Words

After examining...