Data Analysis

Blog posts and articles with tips for analyzing data for quality improvement methodologies, including Six Sigma and Lean.

Statisticians say the darndest things. At least, that's how it can seem if you're not well-versed in statistics.  When I began studying statistics, I approached it as a language. I quickly noticed that compared to other disciplines, statistics has some unique problems with terminology, problems that don't affect most scientific and academic specialties.  For example, dairy science has a highly... Continue Reading
Just 100 years ago, very few statistical tools were available and the field was largely unknown. Since then, there has been an explosion of tools available, as well as ever-increasing awareness and use of statistics.   While most readers of the Minitab Blog are looking to pick up new tools or improve their use of commonly-applied ones, I thought it would be worth stepping back and talking about one... Continue Reading
When you run a regression in Minitab, you receive a huge batch of output, and often it can be hard to know where to start. A lot of times, we get overwhelmed and just go straight to p-values, ignoring a lot of valuable information in the process. This post will give you an introduction to one of the other statistics Minitab displays for you, the VIF, or Variance Inflation Factor.  To start, let's... Continue Reading
If you've read the first two parts of this tale, you know it started when I published a post that involved transforming data for capability analysis. When an astute reader asked why Minitab didn't seem to transform the data outside of the capability analysis, it revealed an oversight that invalidated the original analysis.  I removed the errant post. But to my surprise, the reader who helped me... Continue Reading
Last time, I told you how I had double-checked the analysis in a post that involved running the Johnson transformation on a set of data before doing normal capability analysis on it. A reader asked why the transformation didn't work on the data when you applied it outside of the capability analysis.  I hadn't tried transforming the data that way, but if the transformation worked when performed as... Continue Reading
I don't like the taste of crow. That's a shame, because I'm about to eat a huge helping of it.  I'm going to tell you how I messed up an analysis. But in the process, I learned some new lessons and was reminded of some older ones I should remember to apply more carefully.  This Failure Starts in a Victory My mistake originated in the 2015 Triple Crown victory of American Pharoah. I'm no... Continue Reading
Every now and then I’ll test my Internet speed at home using such sites as http://speedtest.comcast.net  or http://www.att.com/speedtest/.  My need to perform these tests could stem from the cool-looking interfaces they employ on their site, as they display the results using analog speedometers and RPM meters. They could also stem from the validation that I need in "getting what I am paying for,"... Continue Reading
It’s been almost 5 years since I used a quotation from Ghostbusters to introduce one of my early blog posts. But as we’re getting a few bits of entertainment news about the next installment in the Ghostbusters franchise, I thought it might be a good time to talk about busting ghosts in Minitab. In the Minitab sense, ghosts are spaces that are in your data that you can’t see. The busting action is... Continue Reading
By Matthew Barsalou, guest blogger.   Many statistical tests assume the data being tested came from a normal distribution. Violating the assumption of normality can result in incorrect conclusions. For example, a Z test may indicate a new process is more efficient than an older process when this is not true. This could result in a capital investment for equipment that actually results in higher... Continue Reading
Last month the ESPN series Outside the Lines reported on major league pitchers suffering serious injuries from being struck in the head by line drives, and efforts MLB is making towards having protective gear developed for pitchers. You can view the report here if you'd like: A couple of things jump out at me from the clip: The overwhelming majority of pitchers are not interested in wearing... Continue Reading
When data are collected in subgroups, it’s easy to understand how the variation can be calculated within each of the subgroups based the subgroup range or the subgroup standard deviation. When data is not collected in subgroups (so the subgroup size is 1), it may be a little less intuitive to understand how within-subgroup standard deviation is calculated.  How does Minitab Statistical Softwarecalcu... Continue Reading
When someone gives you data to analyze, you can gauge how your life is going by what you've received. Get a Minitab file, or even comma-separated values, and everything feels fine. Get a PDF file, and you start to think maybe you’re cursed because of your no-good-dirty-rotten-pig-stealing-great-great-grandfather and wish that you were someone else. For those of you who might be in such dire... Continue Reading
I recently fielded an interesting question about the probability and survival plots in Minitab Statistical Software's Reliability/Survival menus: Is there a one-to-one match between the confidence interval points on a probability plot and the confidence interval points on survival plot at a specific percentile? Now, this may seem like an easy question, given that the probabilities on a survival plot... Continue Reading
The line plot is an incredibly agile but frequently overlooked tool in the quest to better understand your processes. In any process, whether it's baking a cake or processing loan forms, many factors have the potential to affect the outcome. Changing the source of raw materials could affect the strength of plywood a factory produces. Similarly, one method of gluing this plywood might be better... Continue Reading
By Matthew Barsalou, guest blogger.   Minitab Statistical Software can assist us in our analysis of data, but we must make judgments when selecting the data for an analysis. A good operational definition can be invaluable for ensuring the data we collect can be effectively analyzed using software. Dr. W. Edwards Deming explains in Out of the Crisis (1989), “An operational definition of safe, round,... Continue Reading
In my previous post, I showed you that the coefficients are different when choosing (-1,0,1) vs (1,0) coding schemes for General Linear Model (or Regression).  We used the two different equations to calculate the same fitted values. Here I will focus on showing what the different coefficients represent.  Let's use the data and models from the last blog post: We can display the means for each level... Continue Reading
Since Minitab 17 Statistical Software launched in February 2014, we've gotten great feedback from many people have been using the General Linear Model and Regression tools. But in speaking with people as part of Minitab's Technical Support team, I've found many are noticing that there are two coding schemes available with each. We frequently get calls from people asking how the coding scheme you... Continue Reading
Earlier, I wrote about the different types of data statisticians typically encounter. In this post, we're going to look at why, when given a choice in the matter, we prefer to analyze continuous data rather than categorical/attribute or discrete data.  As a reminder, when we assign something to a group or give it a name, we have created attribute or categorical data.  If we count something, like... Continue Reading
Over the past few years, the average length of an MLB game has been steadily increasing. We can create a quick time series plot in Minitab Statistical Software to display this: As games have been lasting longer, there's been a feeling shared by many that this was a negative. Games seemed to drag on, with a lot of unnecessary stoppages and breaks. To combat this trend, and to try to speed up games to... Continue Reading
In my previous post, I wrote about the hypothesis testing ban in the Journal of Basic and Applied Social Psychology. I showed how P values and confidence intervals provide important information that descriptive statistics alone don’t provide. In this post, I'll cover the editors’ concerns about hypothesis testing and how to avoid the problems they describe. The editors describe hypothesis testing... Continue Reading