dcsimg
 

Stats

Blog posts and articles about statistics principles and how they apply to quality improvement methods like Lean and Six Sigma.

Earlier this month, PLOS.org published an article titled "Ten Simple Rules for Effective Statistical Practice." The 10 rules are good reading for anyone who draws conclusions and makes decisions based on data, whether you're trying to extend the boundaries of scientific knowledge or make good decisions for your business.  Carnegie Mellon University's Robert E. Kass and several co-authors devised... Continue Reading
An outlier is an observation in a data set that lies a substantial distance from other observations. These unusual observations can have a disproportionate effect on statistical analysis, such as the mean, which can lead to misleading results. Outliers can provide useful information about your data or process, so it's important to investigate them. Of course, you have to find them first.  Finding... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Sign Up Today >
Businesses are getting more and more data from existing and potential customers: whenever we click on a web site, for example, it can be recorded in the vendor's database. And whenever we use electronic ID cards to access public transportation or other services, our movements across the city may be analyzed. In the very near future, connected objects such as cars and electrical appliances will... Continue Reading
The last thing you want to do when you purchase a new piece of software is spend an excessive amount of time getting up and running. You’ve probably been ready to the use the software since, well, yesterday. Minitab has always focused on making our software easy to use, but many professional software packages do have a steep learning curve. Whatever package you’re using, here are three things you... Continue Reading
Suppose you’ve collected data on cycle time, revenue, the dimension of a manufactured part, or some other metric that’s important to you, and you want to see what other variables may be related to it. Now what? When I graduated from college with my first statistics degree, my diploma was bona fide proof that I'd endured hours and hours of classroom lectures on various statistical topics, including l... Continue Reading
This is an era of massive data. A huge amount of data is being generated from the web and from customer relations records, not to mention also from sensors used in the manufacturing industry (semiconductor, pharmaceutical, petrochemical companies and many other industries). Univariate Control Charts In the manufacturing industry, critical product characteristics get routinely collected to ensure... Continue Reading
Do you recall my “putting the cart before the horse” analogy in part 1 of this blog series? The comparison is simple. We all, at times, put the cart before the horse in relatively innocuous ways, such as eating your dessert before you’ve eaten your dinner, or deciding what to wear before you’ve been invited to the party. But performing some tasks in the wrong order, such as running a statistical... Continue Reading
Once upon a time, when people wanted to compare the standard deviations of two samples, they had two handy tests available, the F-test and Levene's test. Statistical lore has it that the F-test is so named because it so frequently fails you.1 Although the F-test is suitable for data that are normally distributed, its sensitivity to departures from normality limits when and where it can be used. Leve... Continue Reading
Along with the explosion of interest in visualizing data over the past few years has been an excessive focus on how attractive the graph is at the expense of how useful it is. Don't get me wrong...I believe that a colorful, modern graph comes across better than a black-and-white, pixelated one. Unfortunately, however, all the talk seems to be about the attractiveness and not the value of the... Continue Reading
About a year ago, a reader asked if I could try to explain degrees of freedom in statistics. Since then,  I’ve been circling around that request very cautiously, like it’s some kind of wild beast that I’m not sure I can safely wrestle to the ground. Degrees of freedom aren’t easy to explain. They come up in many different contexts in statistics—some advanced and complicated. In mathematics, they're... Continue Reading
Like so many of us, I try to stay healthy by watching my weight. I thought it might be interesting to apply some statistical thinking to the idea of maintaining a healthy weight, and the central limit theorem could provide some particularly useful insights. I’ll start by making some simple (maybe even simplistic) assumptions about calorie intake and expenditure, and see where those lead. And then... Continue Reading
You have a column of categorical data. Maybe it’s a column of reasons for production downtime, or customer survey responses, or all of the reasons airlines give for those riling flight delays. Whatever type of qualitative data you may have, suppose you want to find the most common categories. Here are three different ways to do that: 1. Pareto Charts Pareto Charts easily help you separate the vital... Continue Reading
If you need to assess process performance relative to some specification limit(s), then process capability is the tool to use. You collect some accurate data from a stable process, enter those measurements in Minitab, and then choose Stat > Quality Tools > Capability Analysis/Sixpack or Assistant > Capability Analysis. Now, what about sorting the data? I’ve been asked “why does Cpk change when I... Continue Reading
In my time at Minitab, I’ve gotten a good understanding of what types of graphs users create. Everyone knows about histograms, bar charts, and time series plots. Even relatively less familiar plots like the interval plot and individual value plot are still used quite often. However, one of the most underutilized graphs we have available is the area graph. If you’re not familiar with an Area... Continue Reading
In an earlier post, I shared an overview of acceptance sampling, a method that lets you evaluate a sample of items from a larger batch of products (for instance, electronics components you've sourced from a new supplier) and use that sample to decide whether or not you should accept or reject the entire shipment.  There are two approaches to acceptance sampling. If you do it by attributes, you... Continue Reading
If you're just getting started in the world of quality improvement, or if you find yourself in a position where you suddenly need to evaluate the quality of incoming or outgoing products from your company, you may have encountered the term "acceptance sampling." It's a statistical method for evaluating the quality of a large batch of materials from a small sample of items, which statistical softwar... Continue Reading
In my last post, I walked through the steps to install Minitab 17 on a Mac using Apple Boot Camp.  Minitab 17 can also be installed on a Mac using desktop virtualization software. In addition to your Mac, you’ll need: A copy of Windows 7 or later version ISO Minitab 17 Statistical Software Desktop virtualization software allows you to install and use Windows on your Intel-based Mac without requiring... Continue Reading
While Minitab 17 is currently a Windows-only application, there are people who only have a Mac available for the installation who also find they need to use Minitab 17.  It is possible to run Minitab 17 on a Macintosh, though the steps involved in the installation can seem a little daunting at first. In the Technical Support department, we sometimes hear reluctance in people’s voices when we throw... Continue Reading
Not long ago, I couldn’t abide statistics. I did respect it, but in much the same way a gazelle respects a lion. Most of my early experiences with statistics indicated that close encounters resulted in pain, so I avoided further contact whenever possible. So how is it that today I write about statistics? That’s simple: it merely required completely reinventing the way I thought about and approached... Continue Reading
There are many reasons why a distribution might not be normal/Gaussian. A non-normal pattern might be caused by several distributions being mixed together, or by a drift in time, or by one or several outliers, or by an asymmetrical behavior, some out-of-control points, etc. I recently collected the scores of three different teams (the Blue team, the Yellow team and the Pink team) after a laser... Continue Reading