dcsimg

How Deadly Is this Ebola Outbreak?

The current Ebola outbreak in Guinea, Liberia, and Sierra Leone is making headlines around the world, and rightfully so: it's a frightening disease, and last week the World Health Organization reported its spread is outpacing their response. Nearly 900 of  the more than 1,600 people infected during this outbreak have died, including some leading medical professionals trying to stanch the outbreak's spread. And yesterday, one of the American doctors who contracted the disease arrived back in the U.S. for treatment.

Many sources state that Ebola virus outbreaks have a case fatality rate of up to...

“You’ve got a friend” in Minitab Support

I caught the end of Toy Story over the weekend, which is definitely one of my all-time favorite children’s movies. Now—unfortunately or fortunately—I can’t get Randy Newman's theme song,“You’ve Got a Friend in Me,” out of my head!

It's also got me thinking about the nature of friendship, and how "best friends forever" are supposed to always be there when you need them. And, not to get too maudlin about it, but just like Woody and Buzz eventually realize their friendship, all of us hope the professionals who use our software also realize that “you’ve got a friend” in Minitab.

Now what do I mean...

How to Use Brushing to Investigate Outliers on a Graph

There’s a lot going on in the world, so you might not have noticed that the Organization for Economic Development (OECD) released their new set of health statistics for member nations. On the OECD website, you can now download the free data series for 2014. (Be aware that “for 2014” means that the organization has a pretty good idea about what happened in 2012.)

Of course, there’s nothing more fun than sharpening your Minitab skills with real data. Each time the OECD releases their data, we hear about how much money is spent per person on health care compared to how long people live in that...

What Is the Difference between Linear and Nonlinear Equations in Regression Analysis?

Previously, I’ve written about when to choose nonlinear regression and how to model curvature with both linear and nonlinear regression. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves.

So, if it’s not the ability to model a curve, what is the difference between a linear and nonlinear regression equation?

Linear Regression Equations

Linear regression requires a linear model. No surprise, right? But what does that really mean?

A model is linear...

You Get a Goal! And You Get a Goal! And You Get a Goal! It’s the 2014 FIFA World Cup!

The 2014 World Cup has gotten off to a high-scoring start. Through the first week of the tournament, an average of 2.9 goals have been scored per game, the highest since 1970. And if that average climbs to over 3 goals per game, this’ll be the highest scoring World Cup since 1958!

So is this year’s World Cup actually bucking a trend of the low scoring tournaments that came before it, or can we simply attribute it to random variation? Let’s use a data analysis to find out!

Determining a Trend

I went to FIFA’s website and collected the goals per game in every world cup. Now that we have the...

How to Interpret a Regression Model with Low R-squared and Low P values

In regression analysis, you'd like your regression model to have significant variables and to produce a high R-squared value. This low P value / high R2 combination indicates that changes in the predictors are related to changes in the response variable and that your model explains a lot of the response variability.

This combination seems to go together naturally. But what if your regression model has significant variables but explains little of the variability? It has low P values and a low R-squared.

At first glance, this combination doesn’t make sense. Are the significant predictors still...

Multiple Regression Analysis and Response Optimization Examples using the Assistant in Minitab 17

In Minitab, the Assistant menu is your interactive guide to choosing the right tool, analyzing data correctly, and interpreting the results. If you’re feeling a bit rusty with choosing and using a particular analysis, the Assistant is your friend!

Previously, I’ve written about the new linear model features in Minitab 17. In this post, I’ll work through a multiple regression analysis example and optimize the response variable to highlight the new features in the Assistant.

Choose a Regression Analysis

As part of a solar energy test, researchers measured the total heat flux. They found that heat...

Can I Just Delete Some Values to Reduce the Standard Variation in My ANOVA?

We received the following question via social media recently:

I am using Minitab 17 for ANOVA. I calculated the mean and standard deviation for these 15 values, but the standard deviation is very high. If I delete some values, I can reduce the standard deviation. Is there an option in Minitab that will automatically indicate values that are out of range and delete them so that the standard deviation is low?

In other words, this person wanted a way to automatically eliminate certain values to lower the standard deviation.

Fortunately, Minitab 17 does not have the functionality that this person was...

Using Probability Plots to Understand Laser Games Scores

There is more than just the p value in a probability plot—the overall graphical pattern also provides a great deal of useful information. Probability plots are a powerful tool to better understand your data.

In this post, I intend to present the main principles of probability plots and focus on their visual interpretation using some real data.

In probability plots, the data density distribution is transformed into a linear plot. To do this, the cumulative density function (the so-called CDF, cumulating all probabilities below a given threshold) is used (see the graph below). For a normal...

Using Predict in Minitab 17 to Validate a Statistical Model

Last time I posted, I showed you how to divide a data set into training and validation samples in Minitab with the promise that next time I would show you a way to use the validation sample. Regression is a good analysis for this, because a validation data set can help you to verify that you’ve selected the best model. I’m going to use a hypothetical example so that you can see how it works when we really know the correct model to use. This will let me show you how Minitab 17’s Predict makes it easy to get the numbers that you need to evaluate your model with the training data set.

(The steps...

How to Correctly Interpret P Values

The P value is used all over statistics, from t-tests to regression analysis. Everyone knows that you use P values to determine statistical significance in a hypothesis test. In fact, P values often determine what studies get published and what projects get funding.

Despite being so important, the P value is a slippery concept that people often interpret incorrectly. How do you interpret P values?

In this post, I'll help you to understand P values in a more intuitive way and to avoid a very common misinterpretation that can cost you money and credibility.

What Is the Null Hypothesis in Hypothesis...

The Best European Football League: What the CTQ’s and Minitab Can Tell Us

by Laerte de Araujo Lima, guest blogger

In a previous post (How Data Analysis Can Help Us Predict This Year's Champions League), I shared how I used Minitab Statistical Software to predict the 2013-2014 season of the UEFA Champions league. This involved the regression analysis of main critical-to-quality (CTQ) factors, which I identified using the “voice of the customer” suggestions of some friends.

Since that post was published, my friends have stopped discussing the UEFA Champions league—they were convinced by the results I shared.

But now they’ve challenged me to use Six Sigma tools to...

Introducing the Bubble Plot

When you're evaluating a dataset, graphical analysis can be very important. While an analysis like a regression or ANOVA can be backed up by numbers, being able to visualize how your dataset is behaving can be even more convincing than a group of p-values—especially to those who aren’t trained in statistics.

For example, let’s look at a few variables we think may be correlated. In this specific example, we will take the Unemployment Rate and the Crime Rate for each state in the U.S. We have 3 columns of data in Minitab: C1, which contains the State Name; C2, which contains the Crime Rate; and...

Predicting the 2014 NCAA Tournament

Once again it’s time for the madness of March to begin! Which teams have the best shot of going to the final four? Is there a team that might become this year’s Florida Gulf Coast? And do any of the 16 seeds have a realistic shot of beating a 1 seed? Well sit back, because we’re going to answer all of that and more!  Somebody tell Cinderella to get her glass slippers, it’s time to go dancing!

Which Ranking System to Use

Before we get to the bracket, we need to decide on which ranking system to use. Because we want to use these rankings for predicting future outcomes, we want a system that uses...

Using Statistics to Show Your Boss Process Improvements

Ughhh... your process is producing some parts that don't meet your customer's specifications! Fortunately, after a little hard work, you find a way to improve the process.

However, you want to perform the appropriate statistical analysis to back up your findings and make it easier to explain the process improvements to your boss. And it's important to remember that your boss is much like the boss in Eston's posts -- he's not too familiar with statistics, so you'll have to take it slow and show lots of "visual aids" in your explanation. How should you begin? 

Enter before-and-after process...

Re-analyzing Wine Tastes with Minitab 17

In April 2012, I wrote a short paper on binary logistic regression to analyze wine tasting data. At that time, François Hollande was about to get elected as French president and in the U.S., Mitt Romney was winning the Republican primaries. That seems like a long time ago…

Now, in 2014, Minitab 17 Statistical Software has just been released. Had Minitab 17, been available in 2012, would have I conducted my analysis in a different way?  Would the results still look similar?  I decided to re-analyze my April 2012 data with Minitab 17 and assess the differences, if there are any.

There were no...

The Stability Report for Control Charts in Minitab 17 includes Example Patterns

Minitab’s Assistant got a lot of splashy upgrades for Minitab 17. The addition of DOE and multiple regression to the Assistant are large feature improvements with obvious advantages. But there are many subtler, but still fantastic additions that shouldn't be overlooked.

One of those additions is the example patterns added to the Stability Report for control charts.
 
The Stability Report was excellent in Minitab 16, clearly showing you the out-of-control points in the process:

But the truth is that it’s sometimes hard to move from detecting the out-of-control points to an understanding of what’s...

Why Is There No R-Squared for Nonlinear Regression?

Nonlinear regression is a very powerful analysis that can fit virtually any curve. However, it's not possible to calculate a valid R-squared for nonlinear regression. This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for nonlinear regression, some other packages do.

So, what’s going on?

Minitab doesn't calculate R-squared for nonlinear models because the research literature shows that it is an invalid goodness-of-fit statistic for this type of model. There are bad consequences if you use it in this context.

Why Is It Impossible to Calculate a...

Opening Ceremonies for Bubble Plots and Poisson Regression

By popular demand, Release 17 of Minitab Statistical Software comes with a new graphical analysis called the Bubble Plot.

This exploratory tool is great for visualizing the relationships among three variables on a single plot.

To see how it works, consider the total medal count by country from the recently completed 2014 Olympic Winter Games. Suppose I want to explore whether there might be a possible association between the number of medals a country won and its maximum elevation. For that, I could use a simple scatterplot, right?

But say I want to throw a third variable into the mix, such as...