# Control Charts: Subgroup Size Matters

Variation is everywhere. It’s in your daily commute to work, it’s in the amount of caffeine you drink every day, in the number of e-mails that arrive in your inbox, etc. Whether you’re monitoring something as ordinary as caffeine consumption or something more important like a multi-million dollar manufacturing process, you can use one simple tool to monitor variation and determine whether the variation you’re seeing is due to natural random fluctuation or if your process is out of control due to some special cause.

The tool I’m referring to is the all-powerful control chart.

Now here's where it starts to get interesting...have you ever seen a control chart that looks like this?

At first glance, this process looks great—it’s in control. But looks can be deceiving. Do you see how the points, especially on the upper Xbar chart, are hugging the center line? Rather than having a beautiful stable process, what we instead have here are control limits that are too wide for the data, which means they will rarely signal an out-of-control situation.

What could be causing this to happen? The answer is subgroup size. After some investigation, I discovered that this control chart had been created using a subgroup size of 5 when in fact the data were collected one measurement at a time. Therefore, an I-MR chart should have been used instead of an Xbar-R chart, since I-MR charts are for subgroup sizes of 1 and Xbar-R charts are for subgroup sizes greater than 1 (and typically less than 9 or 10).

If we instead use the real subgroup size reflective of how we collected the data, here is what we see:

This process is in fact out of control, a fact that we would have missed using a subgroup size of 5.

The moral of the story? The subgroup size you specify in Minitab Statistical Software must reflect how the data were collected. Any old number will not do. Statistically speaking, this will ensure that your subgroups represent only common cause variation, or rather, the variation that naturally occurs in the process.

Got Minitab? If so, use the Assistant to easily catch bungles such as this one.

Name: Dave Sampson• Wednesday, March 19, 2014What about subgroup size when using a histogram with only one column of data? Thanks for your help.

Name: Michelle Paret• Thursday, March 20, 2014Good question, Dave. If you're looking at a single column of data (i.e. a single sample), then you typically don't need to be concerned with subgroup size when creating a histogram. Whether or not your data were collected in subgroups, histograms still provide a view of the overall distribution of the data. Time is not a factor when using histograms (as opposed to control charts where time order is critical).

Hope this helps,

Michelle

Name: Mishelle Bernard• Thursday, May 22, 2014We have data that we collect daily but how much data collect depends on production. How could I determine the subgroup size?

Ex. We produced 5 printing jobs today, each with a varying number of rolls. We tested the seals on roll 1, 4 and 7, then randomly for the rest of each job. These jobs we run at different times on 3 different presses. They do have the same testing parameters, being that they are all the same structure. Would the subgroup size be equal to the number of data points in this case?

Name: Michelle Paret• Friday, May 23, 2014Mishelle, are you taking measurements on these rolls, or are the seals you're testing pass/fail? Also, when you test a seal, is it an individual measurement? More specifically, if you look at the (chronological) data in your worksheet, is each row of data independent from other rows of data? Or are some of the data somehow related?

Name: Michelle Paret• Thursday, June 5, 2014Thank you for your question. The data consisted of measurements (e.g. 46 49 50 43 etc.). The data were in fact NOT related. The observations were independent. And here in lies the issue with the Xbar-R chart where the data were grouped into sample sizes of 5.

Because these data were instead independent, the Individuals and Moving Range Chart was the better choice.