Data Analysis

Blog posts and articles with tips for analyzing data for quality improvement methodologies, including Six Sigma and Lean.

Pareto charts are a special type of bar chart you can use to prioritize almost anything. This makes them very useful in making sound decisions. For example, if you have several possible quality improvement projects, but not enough time or people to do them all now, you can use a Pareto chart to identify which projects have the most potential for making meaningful improvement. Pareto charts look... Continue Reading
Once again, with the arrival of autumn, it's time for a flu shot. I get a flu shot every year even though I know they’re not perfect. I figure they’re a relatively easy and inexpensive way to reduce the chance of having a miserable week. I’ve heard on various news media that their effectiveness is about 60%. But what does 60% effectiveness mean, exactly? How much does this actually reduce the... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >
In Part 1 of this blog series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results.  The common data... Continue Reading
Statistical inference uses data from a sample of individuals to reach conclusions about the whole population. It’s a very powerful tool. But as the saying goes, “With great power comes great responsibility!” When attempting to make inferences from sample data, you must check your assumptions. Violating any of these assumptions can result in false positives or false negatives, thus invalidating... Continue Reading
Data mining can be helpful in the exploratory phase of an analysis. If you're in the early stages and you're just figuring out which predictors are potentially correlated with your response variable, data mining can help you identify candidates. However, there are problems associated with using data mining to select variables. In my previous post, we used data mining to settle on the following... Continue Reading
On the Minitab Blog, we’ve often discussed getting data into Minitab from Excel. Here's a small sampling, in case you currently have data in Excel: Minitab and Excel: Making the (Data) Connection Linking Minitab to Excel to Get Answers Fast 3 Tips for Importing Excel Data into Minitab But if your data is not in Excel to begin with, taking it into Excel to prepare it for entry into Minitab isn’t... Continue Reading
The ultimate goal of most quality improvement projects is clear: reducing the number of defects, improving a response, or making a change that benefits your customers. We often want to jump right in and start gathering and analyzing data so we can solve the problems. Checking your measurement systems first, with methods like attribute agreement analysis or Gage R&R, may seem like a needless waste... Continue Reading
Face it, you love regression analysis as much as I do. Regression is one of the most satisfying analyses in Minitab: get some predictors that should have a relationship to a response, go through a model selection process, interpret fit statistics like adjusted R2 and predicted R2, and make predictions. Yes, regression really is quite wonderful. Except when it’s not. Dark, seedy corners of the data... Continue Reading
We’ve got a plethora of case studies showing how businesses from different industries solve problems and implement solutions with data analysis. Take a look for ideas about how you can use data analysis to ensure excellence at your business! Boston Scientific, one of the world’s leading developers of medical devices, is just one organization who has shared their story. A team at their Heredia,... Continue Reading
True or false: When comparing a parameter for two sets of measurements, you should always use a hypothesis test to determine whether the difference is statistically significant. The answer? (drumroll...) True! ...and False! To understand this paradoxical answer, you need to keep in mind the difference between samples, populations, and descriptive and inferential statistics.  Descriptive Statistics and... Continue Reading
Data mining uses algorithms to explore correlations in data sets. An automated procedure sorts through large numbers of variables and includes them in the model based on statistical significance alone. No thought is given to whether the variables and the signs and magnitudes of their coefficients make theoretical sense. We tend to think of data mining in the context of big data, with its huge... Continue Reading
Today, September 16, is World Ozone Day. You don't hear much about the ozone layer any more. In fact, if you’re under 30, you might think this is just another trivial, obscure observance, along the lines of International Dot Day (yesterday) or National Apple Dumpling Day (tomorrow). But there’s a good reason that, almost 30 years ago, the United Nations designated today to as a day to raise... Continue Reading
I confess: I'm not a natural-born decision-maker. Some people—my wife, for example—can assess even very complex situations, consider the options, and confidently choose a way forward. Me? I get anxious about deciding what to eat for lunch. So you can imagine what it used to be like when I needed to confront a really big decision or problem. My approach, to paraphrase the Byrds, was "Re:... Continue Reading
There may be huge potential benefits waiting in the data in your servers. These data may be used for many different purposes. Better data allows better decisions, of course. Banks, insurance firms, and telecom companies already own a large amount of data about their customers. These resources are useful for building a more personal relationship with each customer. Some organizations already use... Continue Reading
In 2011 we had solar panels fitted on our property. In the last few months we have noticed a few problems with the inverter (the equipment that converts the electricity generated by the panels from DC to AC, and manages the transfer of unused electric to the power company). It was shutting down at various times throughout the day, typically when it was very sunny, resulting in no electricity being... Continue Reading
In regression, "sums of squares" are used to represent variation. In this post, we’ll use some sample data to walk through these calculations. The sample data used in this post is available within Minitab by choosing Help > Sample Data, or File > Open Worksheet > Look in Minitab Sample Data folder (depending on your version of Minitab).  The dataset is called ResearcherSalary.MTW, and contains data... Continue Reading
So the data you nurtured, that you worked so hard to format and make useful, failed the normality test. Time to face the truth: despite your best efforts, that data set is never going to measure up to the assumption you may have been trained to fervently look for. Your data's lack of normality seems to make it poorly suited for analysis. Now what? Take it easy. Don't get uptight. Just let your data... Continue Reading
See if this sounds fair to you. I flip a coin. Heads: You win $1.Tails: You pay me $1. You may not like games of chance, but you have to admit it seems like a fair game. At least, assuming the coin is a normal, balanced coin, and assuming I’m not a sleight-of-hand magician who can control the coin. How about this next game? You pay me $2 to play.I flip a coin over and over until it comes up heads.Your... Continue Reading
I thought 3 posts would capture all the thoughts I had about B10 Life. That is, until this question appeared on the Minitab LinkedIn group: In case you missed it, my first post, How to Calculate B10 Life with Statistical Software, explains what B10 life is and how Minitab calculates this value. My second post, How to Calculate BX Life, Part 2, shows how to compute any BX life in Minitab. But... Continue Reading
The Centers for Medicare and Medicaid Services (CMS) updated their star ratings on July 27. Turns out, the list of hospitals provide a great way to look at how easy it is to get random samples from data within Minitab. Say for example, that you wanted to look at the association between the government’s new star ratings and the safety rating scores provided by hospitalsafetyscore.org. The CMS score... Continue Reading