Adventures in Statistics

Thanks to my desire to understand the deeper mechanics that lie behind what we observe in the world, I suppose it’s natural that I love data analysis. Observation is great, but I can only observe a small slice of reality. I really want to understand the larger picture and know how it all works. Data analysis gives you the keys to do just this whether you are studying how to manufacture the best product, provide the best services, or answering an academic research question.

I’m Jim Frost and I came to Minitab with a background in a wide variety of academic research. My role was the “data/stat...

Old Love Letters and Changing Times

My grandfather's writing desk, with a train timetable to plan their next meeting!

Have you ever seen your present reflected in an object from the past? This summer I've discovered glimpses of my daily life working with statistical software in words written more than 70 years ago.

I’ve been reading through a treasure trove of old love letters that were sent in 1940-41 by my grandfather to my grandmother, who saved them all of these years. This was an exciting time of changes for them both. My grandfather was a farm boy from a small town in Connecticut, who grew up without electricity. He became...

The Veepstakes: Using the Solution Desirability Matrix to Help Mitt Romney Choose the VP Candidate

The GOP vice-presidential sweepstakes, or veepstakes, is heating up. Rumors are swirling about whether Mitt Romney has picked a running mate and when he’ll announce it. Will he pick a more exciting but riskier candidate? Or, will he play it safe?

Have you ever wondered how a candidate like Romney might go about evaluating all of the possibilities? There are many potential VP candidates and many criteria. Wouldn’t it be nice if there were a tool to help sort this out? There is! And it comes from the world of Six Sigma and process improvement.

People associate Six Sigma with statistics, and it...

Busting the Mythbusters with Statistics: Are Yawns Contagious?

This looks like a typical Mythbusters experiment!

Statistics can be unintuitive. What’s a large difference? What’s a large sample size? When is something statistically significant? You might think you know, based on experience and intuition, but you really don’t know until you actually run the analysis. You have to run the proper statistical tests to know what the data are telling you!

Even experts can get tripped up by their hunches, as we'll see.

In my family, we’re huge fans of the Mythbusters. This fun Discovery Channel show mixes science and experiments to prove or disprove various myths,...

World Travel, Bumpy Roads, and Adjusting Your Graph Scales!

Organic coffee plantation in the highlands

I love to travel. In fact, my family and I make it a point to travel abroad every year. We joke that we have a case of chronic travel itch! The memories and experiences last for a lifetime and are priceless.

We just returned from an amazing trip to beautiful Costa Rica. We had a fantastic time hiking through cloud forests, rain forests, and up a volcano. We saw an amazing array of animals in their native habitats: monkeys, sloths, birds, snakes, reptiles, and frogs. In fact, on a night hike through the rainforest, I saw the most dangerous snake in Costa...

Presidential Politics, Political Polls, and Statistics!

It’s election year, and the Presidential campaign is picking up! These are exciting times for my buddy and I who are political junkies. Bring on the banners, slogans, rhetoric, and debates. Our TVs will be filled with ads about everything from financial policy to energy prices. Barack Obama and Mitt Romney may be in our living rooms more often than many family members!

We not only follow all of the races but we make small, friendly bets about the outcomes. However, the winnings pale in importance to the bragging rights! Each bet takes on a life of its own and winning the bet almost becomes more...

Regression Smackdown: Stepwise versus Best Subsets!

In my last post, I professed my fondness for regression analysis. This time, I’m going compare two automatic tools that will help you to create a good regression model.

Imagine a scenario where you have many predictor variables and a response variable. Because there are so many predictor variables, you’d like some help in creating a good regression model. You could try a lot of combinations on your own. But, you’re in luck! Minitab Statistical Software has not one, but two automatic tools that will help you pick a regression model.

These tools are Stepwise Regression and Best Subsets...

A Tribute to Regression Analysis

Shhh. I have a secret. Working at Minitab, I should probably say that I love all of its analyses equally. Perhaps it would be OK to love them differently, but equally. However, I’ve always had a sneaking preference for Minitab’s regression analysis. I just can’t keep it a secret anymore!

If you promise to keep this secret, I’ll give you a special bonus tip at the end of the post. In fact, most of my colleagues here at Minitab don’t even know about this one! Seriously!

I’ve used regression extensively and love it for all of its flexibility. You can use:

  • multiple predictor variables
  • continuous and...

"Do You Feel Lucky, Punk?"

Clint Eastwood, playing Dirty Harry, asked this famous question while confronting a bad guy who was about to reach for his rifle. I’m quite sure that the bad guy carefully pondered the nature of luck, probabilities, and expected outcomes before deciding not to grab his rifle!

A month ago, I did something shocking . . . something that I hadn’t done for several decades. Just like the bad guy in the Dirty Harry movie, I started thinking about luck. Yes, you guessed it:  I bought a lottery ticket for the record-breaking Mega Millions Jackpot. This purchase is shocking for someone like me who knows...

Curing Heteroscedasticity with Weighted Regression in Minitab Statistical Software

In my last post I talked about why you need to check your regression analysis residuals. In a nutshell, your predictors should be so good at explaining (or predicting) the response that only the inherent randomness of any real-world phenomenon remains leftover for the error portion. If you observe explanatory or predictive power in the error, you know that your predictors are missing some of the predictive information. In this post I'll cover a specific type of pattern that you can see in the residuals and show you how to fix the problem.

Regression residuals should have a constant spread...

Why You Need to Check Your Residual Plots for Regression Analysis: Or, To Err is Human, To Err Randomly is Statistically Divine

Anyone who has performed ordinary least squares (OLS) regression analysis knows that you need to check the residual plots in order to validate your model. Have you ever wondered why? There are mathematical reasons, of course, but I’m going to focus on the conceptual reasons. The bottom line is that randomness and unpredictability are crucial components of any regression model. If you don’t have those, your model is not valid.

Why? To start, let’s breakdown and define the 2 basic components of a valid regression model:

Response = (Constant + Predictors) + Error 

Another way we can say this is:

Respo...

The Graphical Benefits of Identifying the Distribution of Your Data

In my previous post, we identified the distribution of the body fat data. Today, we're going to explore several benefits of knowing the distribution, with a special emphasis on creating informative graphs! After all, if you are not sure what a specific distribution with such and such parameters looks like, a graph gives you the picture!

Using the Distribution Information

So far, we have identified the distribution and the parameter values for the body fat data from 14-year-old girls.

3-Parameter Weibull Distribution:

  • Shape = 1.85718
  • Scale = 14.07043
  • Threshold = 16.06038

How does that help us? What...

How to Identify the Distribution of Your Data using Minitab

I love all data, whether it’s normally distributed or downright bizarre. However, many people are more comfortable with the symmetric, bell-shaped curve of a normal distribution. It is not as intuitive to understand a Gamma distribution, with its shape and scale parameters, as it is to understand the familiar Normal distribution with its mean and standard deviation.

However, it's a fact of life that not all data follow the Normal distribution. Hey, a lot of stuff is just abnormal...er...non-normally distributed. How to understand and present the practical implications of your non-normal...

How to Predict with Minitab: Using BMI to Predict the Body Fat Percentage, Part 2

This is part two in a series. Read part one here.

In part 1, I used Minitab Statistical Software to develop a regression model that describes the relationship between body mass index (BMI) and body fat precentage. In this post, I will use this model to predict body fat percentage and assess the precision of my predictions. I'll also compare Minitab's predictions to those of bathroom scales that use bioelectrical impedance analysis (BIA) to estimate percent body fat.

Assessing Predicted R-Squared

Previously, I used a Fitted Line Plot to assess the model. To generate predictions, I’ll use the same...

How to Predict with Minitab: Using BMI to Predict the Body Fat Percentage, Part 1

Wouldn’t it be nice to be able to predict something that is important to you?  Sure—and it would be extra nice if you knew how accurate your predictions will be. Well, you don’t have to be a psychic to have these powers because Minitab Statistical Software gives them to you! You can find these predictive powers in regression, general linear model (ANOVA), and the design of experiments (DOE).

Prediction with Regression Analysis

We’ll explore prediction with regression analysis by using a person’s body mass index (BMI) to predict their percentage of body fat. For extra fun, we’ll compare Minitab’s...

Reassessing GDP Growth with Data and Statistics, part 3

This is part three in a series where we assess what information we can obtain from the various estimates of quarterly GDP growth using statistical analysis and a control chart. Read part one here and part two here.  You can download the Minitab Statistical Software project file used in this series here.

Today, the U.S. Bureau of Economic Analysis (BEA) announced that the gross domestic product (GDP) grew at an estimated 2.8 percent for the fourth quarter of 2011. This is up from 1.8 in the third quarter. Can we draw any conclusions from this change? Before we can answer that, we need to assess the...

Reassessing GDP Growth with Data and Statistics, part 2

This is part two in a series where we assess what information we can obtain from the various estimates of quarterly GDP growth using statistical analysis and a control chart. Read part one here. You can download the Minitab Statistical Software data file used in this series here.

Understanding the I-MR Chart

An I-MR chart comprises two plots, the individuals (I) chart on the top and the moving range (MR) chart on the bottom. The I chart displays the measurements and provides a means to assess the process center. The MR chart displays the absolute change between successive measurements and provides...

Reassessing GDP Growth with Data and Statistics, part 1

If you combine tough economic times with a Presidential election year, you get a heightened interest in how the economy is changing. Is it growing faster or slowing down? Unsurprisingly, there are many contradictory predictions about what will happen over the longer term. You'll find countless TV pundits pushing their opinions.

A lot of attention focuses on the quarterly GDP growth. I’m going to explore what you can learn about this particular economic measure from statistical techniques and actual data. By the end of this blog, you’ll understand what you can and cannot infer from these...

The Mysteries of Variability and Power

Variability can make things difficult whether you are performing data analysis for a quality improvement initiative or for an academic study. Recently, I detailed how variability reduces your statistical power. As promised, this will help you solve a mystery.

One of the many things I love about research are the unexpected mysteries. You get to be Sherlock Holmes! When you’re exploring the unknown, you’re bound to run into surprising results that you can’t explain at first. The clues are often buried in the data that you’ve already collected. You just need to put the pieces together.

The...

Variability and Statistical Power

For my last several posts, I’ve been writing about the problems associated with variability. First, I showed how variability is bad for customers. Next, I showed how variability is generally harder to control than the mean. In this post, I’ll show yet one more way that variability causes problems!

Variability can dramatically reduce your statistical power during hypothesis testing. Statistical power is the probability that a test will detect a difference (or effect) that actually exists.

It’s always a good practice to understand the variability present in your subject matter and how it impacts...

Quality Improvement: Controlling Variability More Difficult than Controlling the Mean

In my post Assessing Variability for Quality Improvement, I showed how measuring variability is as important as measuring the mean for a product or service in a quality improvement initiative. The mean, by itself, often tells an incomplete story. Additionally, quality management veterans know that controlling the variability is often more difficult than controlling the mean. If you want to change the mean, it often entails adjusting a manufacturing setting or target. However, reducing variability often requires new technology or new procedures. 
   


For example, in the image above, it’s...