dcsimg

How to Use Brushing to Investigate Outliers on a Graph

There’s a lot going on in the world, so you might not have noticed that the Organization for Economic Development (OECD) released their new set of health statistics for member nations. On the OECD website, you can now download the free data series for 2014. (Be aware that “for 2014” means that the organization has a pretty good idea about what happened in 2012.)

Of course, there’s nothing more fun than sharpening your Minitab skills with real data. Each time the OECD releases their data, we hear about how much money is spent per person on health care compared to how long people live in that...

What Is the Difference between Linear and Nonlinear Equations in Regression Analysis?

Previously, I’ve written about when to choose nonlinear regression and how to model curvature with both linear and nonlinear regression. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves.

So, if it’s not the ability to model a curve, what is the difference between a linear and nonlinear regression equation?

Linear Regression Equations

Linear regression requires a linear model. No surprise, right? But what does that really mean?

A model is linear...

You Get a Goal! And You Get a Goal! And You Get a Goal! It’s the 2014 FIFA World Cup!

The 2014 World Cup has gotten off to a high-scoring start. Through the first week of the tournament, an average of 2.9 goals have been scored per game, the highest since 1970. And if that average climbs to over 3 goals per game, this’ll be the highest scoring World Cup since 1958!

So is this year’s World Cup actually bucking a trend of the low scoring tournaments that came before it, or can we simply attribute it to random variation? Let’s use a data analysis to find out!

Determining a Trend

I went to FIFA’s website and collected the goals per game in every world cup. Now that we have the...

How to Interpret a Regression Model with Low R-squared and Low P values

In regression analysis, you'd like your regression model to have significant variables and to produce a high R-squared value. This low P value / high R2 combination indicates that changes in the predictors are related to changes in the response variable and that your model explains a lot of the response variability.

This combination seems to go together naturally. But what if your regression model has significant variables but explains little of the variability? It has low P values and a low R-squared.

At first glance, this combination doesn’t make sense. Are the significant predictors still...

Multiple Regression Analysis and Response Optimization Examples using the Assistant in Minitab 17

In Minitab, the Assistant menu is your interactive guide to choosing the right tool, analyzing data correctly, and interpreting the results. If you’re feeling a bit rusty with choosing and using a particular analysis, the Assistant is your friend!

Previously, I’ve written about the new linear model features in Minitab 17. In this post, I’ll work through a multiple regression analysis example and optimize the response variable to highlight the new features in the Assistant.

Choose a Regression Analysis

As part of a solar energy test, researchers measured the total heat flux. They found that heat...

Can I Just Delete Some Values to Reduce the Standard Variation in My ANOVA?

We received the following question via social media recently:

I am using Minitab 17 for ANOVA. I calculated the mean and standard deviation for these 15 values, but the standard deviation is very high. If I delete some values, I can reduce the standard deviation. Is there an option in Minitab that will automatically indicate values that are out of range and delete them so that the standard deviation is low?

In other words, this person wanted a way to automatically eliminate certain values to lower the standard deviation.

Fortunately, Minitab 17 does not have the functionality that this person was...

Using Probability Plots to Understand Laser Games Scores

There is more than just the p value in a probability plot—the overall graphical pattern also provides a great deal of useful information. Probability plots are a powerful tool to better understand your data.

In this post, I intend to present the main principles of probability plots and focus on their visual interpretation using some real data.

In probability plots, the data density distribution is transformed into a linear plot. To do this, the cumulative density function (the so-called CDF, cumulating all probabilities below a given threshold) is used (see the graph below). For a normal...

Using Predict in Minitab 17 to Validate a Statistical Model

Last time I posted, I showed you how to divide a data set into training and validation samples in Minitab with the promise that next time I would show you a way to use the validation sample. Regression is a good analysis for this, because a validation data set can help you to verify that you’ve selected the best model. I’m going to use a hypothetical example so that you can see how it works when we really know the correct model to use. This will let me show you how Minitab 17’s Predict makes it easy to get the numbers that you need to evaluate your model with the training data set.

(The steps...

How to Correctly Interpret P Values

The P value is used all over statistics, from t-tests to regression analysis. Everyone knows that you use P values to determine statistical significance in a hypothesis test. In fact, P values often determine what studies get published and what projects get funding.

Despite being so important, the P value is a slippery concept that people often interpret incorrectly. How do you interpret P values?

In this post, I'll help you to understand P values in a more intuitive way and to avoid a very common misinterpretation that can cost you money and credibility.

What Is the Null Hypothesis in Hypothesis...

The Best European Football League: What the CTQ’s and Minitab Can Tell Us

by Laerte de Araujo Lima, guest blogger

In a previous post (How Data Analysis Can Help Us Predict This Year's Champions League), I shared how I used Minitab Statistical Software to predict the 2013-2014 season of the UEFA Champions league. This involved the regression analysis of main critical-to-quality (CTQ) factors, which I identified using the “voice of the customer” suggestions of some friends.

Since that post was published, my friends have stopped discussing the UEFA Champions league—they were convinced by the results I shared.

But now they’ve challenged me to use Six Sigma tools to...

Introducing the Bubble Plot

When you're evaluating a dataset, graphical analysis can be very important. While an analysis like a regression or ANOVA can be backed up by numbers, being able to visualize how your dataset is behaving can be even more convincing than a group of p-values—especially to those who aren’t trained in statistics.

For example, let’s look at a few variables we think may be correlated. In this specific example, we will take the Unemployment Rate and the Crime Rate for each state in the U.S. We have 3 columns of data in Minitab: C1, which contains the State Name; C2, which contains the Crime Rate; and...

Predicting the 2014 NCAA Tournament

Once again it’s time for the madness of March to begin! Which teams have the best shot of going to the final four? Is there a team that might become this year’s Florida Gulf Coast? And do any of the 16 seeds have a realistic shot of beating a 1 seed? Well sit back, because we’re going to answer all of that and more!  Somebody tell Cinderella to get her glass slippers, it’s time to go dancing!

Which Ranking System to Use

Before we get to the bracket, we need to decide on which ranking system to use. Because we want to use these rankings for predicting future outcomes, we want a system that uses...

Using Statistics to Show Your Boss Process Improvements

Ughhh... your process is producing some parts that don't meet your customer's specifications! Fortunately, after a little hard work, you find a way to improve the process.

However, you want to perform the appropriate statistical analysis to back up your findings and make it easier to explain the process improvements to your boss. And it's important to remember that your boss is much like the boss in Eston's posts -- he's not too familiar with statistics, so you'll have to take it slow and show lots of "visual aids" in your explanation. How should you begin? 

Enter before-and-after process...

Re-analyzing Wine Tastes with Minitab 17

In April 2012, I wrote a short paper on binary logistic regression to analyze wine tasting data. At that time, François Hollande was about to get elected as French president and in the U.S., Mitt Romney was winning the Republican primaries. That seems like a long time ago…

Now, in 2014, Minitab 17 Statistical Software has just been released. Had Minitab 17, been available in 2012, would have I conducted my analysis in a different way?  Would the results still look similar?  I decided to re-analyze my April 2012 data with Minitab 17 and assess the differences, if there are any.

There were no...

The Stability Report for Control Charts in Minitab 17 includes Example Patterns

Minitab’s Assistant got a lot of splashy upgrades for Minitab 17. The addition of DOE and multiple regression to the Assistant are large feature improvements with obvious advantages. But there are many subtler, but still fantastic additions that shouldn't be overlooked.

One of those additions is the example patterns added to the Stability Report for control charts.
 
The Stability Report was excellent in Minitab 16, clearly showing you the out-of-control points in the process:

But the truth is that it’s sometimes hard to move from detecting the out-of-control points to an understanding of what’s...

Why Is There No R-Squared for Nonlinear Regression?

Nonlinear regression is a very powerful analysis that can fit virtually any curve. However, it's not possible to calculate a valid R-squared for nonlinear regression. This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for nonlinear regression, some other packages do.

So, what’s going on?

Minitab doesn't calculate R-squared for nonlinear models because the research literature shows that it is an invalid goodness-of-fit statistic for this type of model. There are bad consequences if you use it in this context.

Why Is It Impossible to Calculate a...

Opening Ceremonies for Bubble Plots and Poisson Regression

By popular demand, Release 17 of Minitab Statistical Software comes with a new graphical analysis called the Bubble Plot.

This exploratory tool is great for visualizing the relationships among three variables on a single plot.

To see how it works, consider the total medal count by country from the recently completed 2014 Olympic Winter Games. Suppose I want to explore whether there might be a possible association between the number of medals a country won and its maximum elevation. For that, I could use a simple scatterplot, right?

But say I want to throw a third variable into the mix, such as...

Histograms are Even Easier to Compare in Minitab 17

Minitab 17 came out yesterday and it’s got quite a few neat features in it. You can check some of them out on the What’s New in Minitab 17 page. But one of my very favorite things is related to one of my previous blog posts that showed how to make histograms that are easy to compare. Turns out, you don’t need those steps anymore. You can do it all with Minitab’s Assistant.

Here’s how to open the data that I’m using if you want to follow along.

  • Choose File > Open Worksheet.
  • Click Look in Minitab Sample Data Folder.
  • Select Cap.MTW and click Open.

You can still rearrange a paneled histogram to make...

How to Handle Extreme Outliers in Capability Analysis

Transformations and non-normal distributions are typically the first approaches considered when the when the Normality test fails in a capability analysis. These approaches do not work when there are extreme outliers because they both assume the data come from a single common-cause variation distribution. But because extreme outliers typically represent special-cause variation, transformations and non-normal distributions are not good approaches for data that contain extreme outliers.

As an example, the four graphs below show distribution fits for a dataset with 99 values simulated from a...