dcsimg
 

Data Analysis

Blog posts and articles with tips for analyzing data for quality improvement methodologies, including Six Sigma and Lean.

There's nothing like a boxplot, aka box-and-whisker diagram, to get a quick snapshot of the distribution of your data. With a single glance, you can readily intuit its general shape, central tendency, and variability. To easily compare the distribution of data between groups, display boxplots for the groups side by side. Visually compare the central value and spread of the distribution for each... Continue Reading
In statistics, there are things you need to do so you can trust your results. For example, you should check the sample size, the assumptions of the analysis, and so on. In regression analysis, I always urge people to check their residual plots. In this blog post, I present one more thing you should do so you can trust your regression results in certain circumstances—standardize the continuous... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >
If you want to convince someone that at least a basic understanding of statistics is an essential life skill, bring up the case of Lucia de Berk. Hers is a story that's too awful to be true—except that it is completely true. A flawed analysis irrevocably altered de Berk's life and kept her behind bars for a full decade, and the fact that this analysis targeted and harmed just one person makes it... Continue Reading
If you need to assess process performance relative to some specification limit(s), then process capability is the tool to use. You collect some accurate data from a stable process, enter those measurements in Minitab, and then choose Stat > Quality Tools > Capability Analysis/Sixpack or Assistant > Capability Analysis. Now, what about sorting the data? I’ve been asked “why does Cpk change when I... Continue Reading
In my time at Minitab, I’ve gotten a good understanding of what types of graphs users create. Everyone knows about histograms, bar charts, and time series plots. Even relatively less familiar plots like the interval plot and individual value plot are still used quite often. However, one of the most underutilized graphs we have available is the area graph. If you’re not familiar with an Area... Continue Reading
Any time you see a process changing, it's important to determine why. Is it indicative of a long term trend, or is it a fad that you can ignore since it will be gone shortly?  For example, in the 2014 NBA Finals, the San Antonio Spurs beat the two-time defending champion Miami Heat by attempting more 3-pointers (23.6 per game) than any championship team in league history. In the 2015 regular... Continue Reading
Ahh, bottled water. Refreshing, convenient...and sometimes pricey. Or in my case, I should say usually pricey. Confession: I’m a sucker for water that comes in the “pretty” plastic bottles, and my experience is that the pretty-bottle brands are usually the pricier ones. Does bottled water cost increase with the fanciness of the bottle? Well, that could be an analysis for a different day … My... Continue Reading
If you perform linear regression analysis, you might need to compare different regression lines to see if their constants and slope coefficients are different. Imagine there is an established relationship between X and Y. Now, suppose you want to determine whether that relationship has changed. Perhaps there is a new context, process, or some other qualitative change, and you want to determine... Continue Reading
At the start of a new year, I like to look for data that’s labeled 2016. While it’s not necessarily new for 2016, one of the first data sets I found was healthcare.gov’s data about qualified health and stand-alone dental plans offered through their site. Now, there’s lots of fun stuff to poke around in a data set this size—there are over 90,000 records on more than 140 variables. But to start out I... Continue Reading
In an earlier post, I shared an overview of acceptance sampling, a method that lets you evaluate a sample of items from a larger batch of products (for instance, electronics components you've sourced from a new supplier) and use that sample to decide whether or not you should accept or reject the entire shipment.  There are two approaches to acceptance sampling. If you do it by attributes, you... Continue Reading
Now that we've seen how easy it is to create plans for acceptance sampling by variables, and to compare different sampling plans, it's time to see how to actually analyze the data you collect when you follow the sampling plan.  If you'd like to follow along and you're not already using Minitab, please download the free 30-day trial.  Collecting the Data for Acceptance Sampling by Variable If you'll... Continue Reading
In my last post, I showed how to use Minitab Statistical Software to create an acceptance sampling plan by variables, using the scenario of a an electronics company that receives monthly shipments of LEDs that must have soldering leads that are at least 2 cm long. This time, we'll compare that plan with some other possible options.  The variables sampling plan we came up with to verify the... Continue Reading
If you're just getting started in the world of quality improvement, or if you find yourself in a position where you suddenly need to evaluate the quality of incoming or outgoing products from your company, you may have encountered the term "acceptance sampling." It's a statistical method for evaluating the quality of a large batch of materials from a small sample of items, which statistical softwar... Continue Reading
This is an era of massive data. A huge amount of data is being generated from the web, from customer relations records but also from sensors used in the manufacturing industry (semiconductor, pharmaceutical, petrochemical companies and many other industries). Univariate Control charts In the manufacturing industry, critical product characteristics get routinely collected to ensure that all products... Continue Reading
Many of us have data stored in a database or file that we need to analyze on a regular basis. If you're in that situation and you're using Minitab Statistical Software, here's how you can save some time and effort by automating the process. When you're finished, instead of using File > Query Database (ODBC) each time you want to perform analysis on the most up-to-date set of data, you can add a... Continue Reading
There are many reasons why a distribution might not be normal/Gaussian. A non-normal pattern might be caused by several distributions being mixed together, or by a drift in time, or by one or several outliers, or by an asymmetrical behavior, some out-of-control points, etc. I recently collected the scores of three different teams (the Blue team, the Yellow team and the Pink team) after a laser... Continue Reading
Having delivered training courses on capability analyses with Minitab, several times, I have noticed that one question you can be absolutely sure will be asked, during the course, is: What is the difference between the Cpk and the Ppk indices? Ppk vs. Cpk indices The terms Cpk and Ppk are often confused, so that when quality or process engineers refer to the Cpk index, they often actually intend to... Continue Reading
Back when I was an undergrad in statistics, I unfortunately spent an entire semester of my life taking a class, diligently crunching numbers with my TI-82, before realizing 1) that I was actually in an Analysis of Variance (ANOVA) class, 2) why I would want to use such a tool in the first place, and 3) that ANOVA doesn’t necessarily tell you a thing about variances. Fortunately, I've had a lot more... Continue Reading
Control charts are a fantastic tool. These charts plot your process data to identify common cause and special cause variation. By identifying the different causes of variation, you can take action on your process without over-controlling it. Assessing the stability of a process can help you determine whether there is a problem and identify the source of the problem. Is the mean too high, too low,... Continue Reading
Last time I touched on the subject of the greatest Super Bowl quarterback, I promised a multivariate analysis considering several different statistics. Let’s get right to a factor analysis. Getting Ready for Factor Analysis One purpose of factor analysis is to identify underlying factors that you can’t measure directly. These factors explain the variation of many different variables in fewer... Continue Reading