Back when I used to work in Minitab Tech Support, customers often asked me, “What’s the difference between Cpk and Ppk?” It’s a good question, especially since many practitioners default to using Cpk while overlooking Ppk altogether. It’s like the '80s pop duo Wham!, where Cpk is George Michael and Ppk is that other guy.
Poofy hairdos styled with mousse, shoulder pads, and leg warmers aside, let’s start by defining rational subgroups and then explore the difference between Cpk and Ppk.
Rational Subgroups
A rational subgroup is a group of measurements produced under the same set of conditions. Subgroups are meant to represent a snapshot of your process. Therefore, the measurements that make up a subgroup should be taken from a similar point in time. For example, if you sample 5 items every hour, your subgroup size would be 5.
Formulas, Definitions, Etc.
The goal of capability analysis is to ensure that a process is capable of meeting customer specifications, and we use capability statistics such as Cpk and Ppk to make that assessment. If we look at the formulas for Cpk and Ppk for normal (distribution) process capability, we can see they are nearly identical:
The only difference lies in the denominator for the Upper and Lower statistics: Cpk is calculated using the WITHIN standard deviation, while Ppk uses the OVERALL standard deviation. Without boring you with the details surrounding the formulas for the standard deviations, think of the within standard deviation as the average of the subgroup standard deviations, while the overall standard deviation represents the variation of all the data. This means that:
Cpk:
- Only accounts for the variation WITHIN the subgroups
- Does not account for the shift and drift between subgroups
- Is sometimes referred to as the potential capability because it represents the potential your process has at producing parts within spec, presuming there is no variation between subgroups (i.e. over time)
Ppk:
- Accounts for the OVERALL variation of all measurements taken
- Theoretically includes both the variation within subgroups and also the shift and drift between them
- Is where you are at the end of the proverbial day
Examples of the Difference Between Cpk and Ppk
For illustration, let's consider a data set where 5 measurements were taken every day for 10 days.
Example 1 - Similar Cpk and Ppk
As the graph on the left side shows, there is not a lot of shift and drift between subgroups compared to the variation within the subgroups themselves. Therefore, the within and overall standard deviations are similar, which means Cpk and Ppk are similar, too (at 1.13 and 1.07, respectively).
Example 2 - Different Cpk and Ppk
In this example, I used the same data and subgroup size, but I shifted the data around, moving it into different subgroups. (Of course we would never want to move data into different subgroups in practice – I’ve just done it here to illustrate a point.)
Since we used the same data, the overall standard deviation and Ppk did not change. But that’s where the similarities end.
Look at the Cpk statistic. It’s 3.69, which is much better than the 1.13 we got before. Looking at the subgroups plot, can you tell why Cpk increased? The graph shows that the points within each subgroup are much closer together than before. Earlier I mentioned that we can think of the within standard deviation as the average of the subgroup standard deviations. So less variability within each subgroup equals a smaller within standard deviation. And that gives us a higher Cpk.
To Ppk or Not to Ppk
And here is where the danger lies in only reporting Cpk and forgetting about Ppk like it’s George Michael’s lesser-known bandmate (no offense to whoever he may be). We can see from the examples above that Cpk only tells us part of the story, so the next time you examine process capability, consider both your Cpk and your Ppk. And if the process is stable with little variation over time, the two statistics should be about the same anyway.
(Note: It is possible, and okay, to get a Ppk that is larger than Cpk, especially with a subgroup size of 1, but I’ll leave explanation for another day.)
Time: Tuesday, June 26, 2012
Michelle, thanks for this post. Long term vs Short term capability, subrational subgroups, are extremely important concepts.
Looking forward your "Cpk-larger-than-Ppk-when-subgroup-size-of-1" article.
If possible, consider for a future post to talk about confidence intervals for Cpk and/or Ppk.
Time: Wednesday, June 27, 2012
NIce clear thoughts. Liked it.
Keep it up buddy!
Time: Friday, July 20, 2012
Great explanation. I second the comment by Omar on the "Cpk-larger-than-Ppk-when-subgroup-size-of-1" topic. This is a very common question. I'll be looking for it.
Time: Monday, October 15, 2012
Really liked the article.
My question is how Minitab calculates different values for Cpk and Ppk when there are no subgroups (subgroup size = 1)
Time: Monday, October 15, 2012
Chuck, I'm glad you liked the article.
Good question about Cpk vs. Ppk when subgroup size =1. In this case, Minitab uses the average moving range to calculate the within stdev (and Cpk), not the typical stdev formula which is used to calculate the overall stdev (and Ppk).
Time: Monday, November 26, 2012
Great article thank you. Am I correct in thinking that if I run a test and vary process variables that I should use the Ppk? Since the subgroups are not the same the Cpk is not a true reflection of the variability as I am introducing variability by changing the process. Thank you
Time: Thursday, November 29, 2012
Very nice post. I googled "CPK and PPK" and found this. Much better than wikipedia's explanation. So here I am, a SAS programmer who is going to start following a mintab blog!
Time: Wednesday, December 5, 2012
Mike, if you are varying process variables then it's likely your process will not be stable, which is one of the important assumptions for capability analysis. In addition, if you are introducing variability, then the overall stdev (used to calculate Ppk) will not be representative of the variation your process exhibits at any given time. I would suggest getting your process to a stable state and then collecting data to evaluate the process capability of the current, stable process.
Quentin, I'm happy to hear the explanation provided was helpful. Thank you for following our blog.
Time: Monday, December 17, 2012
Great article, not sure if the "...subgroup size of 1" article is avialable yet.
My question:
If we are collecting data in no particular order and using a subgroup size of one, can we hope to get a Cpk that has any connection to reality? Slightly alter the order of the data and we get a different Cpk...
Time: Tuesday, December 18, 2012
Kerry, I'm glad the article was helpful. Great question about what to do when the data was recorded in no particular order. When the subgroup size is 1, within stdev is calculated using the average moving range. In other words, Minitab looks at the range between row1 and row2, then row2 and row3, etc. Minitab assumes the data are in chronological order. That is why changing the order of the data affects the average moving range and thus Cpk.
If you do not know in what order the data were collected, I highly recommend using Assistant > Capability Analysis > Capability Analysis > Snapshot. Minitab will then provide you with only the statistics (e.g. Ppk) that are applicable.
(And I haven't gotten around to writing the Ppk may be larger than Cpk when n=1 post yet. Hopefully I will have time ones of these days...)
Time: Wednesday, January 16, 2013
Is this formula right?
δ^2 overall=δ^2 within+δ^(2 ) between
Time: Wednesday, January 16, 2013
Vahid, for process capability for the normal distribution, the overall stdev is calculated using the typical stdev formula (e.g. use Stat > Basic Statistics > Display Descriptive Statistics). Depending on what options you have selected, the formula might also divide by c4 (i.e. stdev overall = stdev/c4) where c4 is an unbiasing constant.
Time: Tuesday, February 12, 2013
Thanks for article, For normal data the formula is easy to understand. Would you please elaborate on non normal data what's the difference between cpk & ppk
Time: Tuesday, February 12, 2013
Ramesh, this is a good question and one that comes up often. When you choose a nonnormal distribution to model your data, Minitab cannot calculate within-subgroup capability metrics such as Cpk. For a detailed explanation of why nonnormal distributions preclude a within-subgroup analysis, please see http://www.minitab.com/support/documentation/Answers/NoWithinSubgroupCapability.pdf.
Time: Tuesday, February 12, 2013
Most places I work (have worked) have copious amounts of data and are not doing logical sampling.
They also tend to set the subgroup size to 1.
In that case I advise that the “overall” or ppk is the real number. The cpk is the “right of the process”
Great stuff. Write more please
Time: Friday, February 22, 2013
HI Grate article!
My query is, while caluclating the either Cpk or Ppk to a particular parameter whether i have to mention both the values so as to assure my customer that the future prodution be qualitative? Currently i am quoting the Ppk. Pls suggest
Time: Monday, February 25, 2013
I would leave it up to your customer as to whether or not you report just Ppk or both Cpk and Ppk. It's possible your customer is most interested in Ppk since it reflects the current state of your overall process.
Time: Monday, July 15, 2013
Good article.
Looking forward your "Cpk-larger-than-Ppk-when-subgroup-size-of-1" article.
Time: Thursday, August 15, 2013
If I have a sample example 32 parts and I measure a feature and run an analysis would this be a Ppk? If I taking periodic measurements and collecting data over time and the process was in control can I set control limits for a Cpk?
Time: Monday, August 19, 2013
For your process capability, on say 32 parts, you can calculate both Ppk and Cpk (presuming you're using Minitab's capability analysis for a normal distribution). Both Ppk and Cpk are statistics that can be used for measurements collected over time. And both statistics should be applied only when the process is in control.
I am not sure how you want to use capability analysis to "set control limits" since control limits are calculated using the process data itself, so if you could please provide more detail, I'd be happy to address that part of your question.
Time: Friday, August 23, 2013
Nice post! Brief & to the point.
Time: Wednesday, September 4, 2013
Great Post.
In case of sample size =1, how to calculate std. dev.(within) using average of moving range?
Time: Tuesday, September 10, 2013
Siva, I'm glad you liked the post. When sample size = 1, stdev(within) = average moving range / unbiasing constant d2
If you use a moving range of length 2 (the Minitab default), then d2=1.128.
Time: Thursday, September 19, 2013
Michelle, great post. I have a question on whether cpk or ppk best fits my data. Lets say I have 30 parts that I have to take an electrical measurement on, but each measurement is taken at 3 different temperatures (cold, room, and hot temp). In addition, at each temperature measurements are taken using 3 different voltages. In all there will be a total of 270 datapoints. Would Cpk or Ppk best represent the variation caused by the 2 variables (temperature and voltage). Or do I have to analyze the data separately according to each variable.
Thanks,
Bob
Time: Thursday, September 19, 2013
Bob, this is an interesting question. I don't have a lot of experience with analysis for electrical measurements, but based on your description I would start by analyzing the data separately for each variable. It could be, for example, that your process is capable at cold and room temperature, but not at hot temps. If you did your analysis on all of the data together rather than separately, you would not be able to detect this. And I would think that this behavior is something you'd want to detect.
Time: Friday, September 20, 2013
Hi Michelle - great article! Really helped clarify a lot of the confusion I have had around this.
I have one question around data sampled in order with only 1 subgroup (e.g. from one production run). Lets assume the data is normal and the process is in control. Based on what I have read above, the Cpk would tell us how much the moving average varies (i.e between row1 and row2, and then between row2 and row3, treating each incremental row difference as a new subgroup). The Ppk would tell us the true variation in the sampled process population. What do each of these tell us about the process's ability to meet the specification and which is better to use?
Many thanks,
Simon.
Time: Friday, September 20, 2013
Simon, I'm glad the article was helpful. To answer your question about which statistic is better to report, it depends upon your goal. If you want to represent the current state of the process, then I would lean towards Ppk. However, if you want to report the potential of your process, then Cpk is theoretically a better representation. Or, you could always use both to get a complete picture of your process.
Time: Thursday, September 26, 2013
Hi Is the Cpk calculation from the Ansii standard or is it another standard. If so which one.
Time: Monday, September 30, 2013
Alan, the Cpk calculation that Minitab uses can be found in a variety of texts including the Automotive Industry Action Group (AIAG) manual on Statistical Process Control.
Time: Wednesday, October 2, 2013
Great article!
One question related to the sample size = 1 case.
If we have a batch process, where we measure only a single sample per batch, and batches might not be produced consecutively (imagine a multi-product production line that switches between different products), can the Cpk calculation using subgroup = 1 be used at all?
Thanks!
between
Time: Tuesday, October 8, 2013
Matej, great question. Since Cpk is calculated using the average moving range (for |row2-row1|, |row3-row2|, etc.), then the data needs to be in chronological order in order for that statistic to be calculated correctly. If it is not possible to enter the data in this manner, then I would solely use Ppk to assess the process capability. Please let me know if this does not sufficiently answer your question.
Time: Thursday, October 24, 2013
This post is just awesome!
Time: Wednesday, November 6, 2013
Hi Michelle,
Would it be possible to obtain the data set you used for your examples?
Time: Friday, November 8, 2013
hi. great & fun explainations. what if my ppk value is 0.84 and my cpk value is 2.55. should i use the cpk value then to show that my process is capable?
Time: Tuesday, November 12, 2013
Liyana, although the high Cpk indicates that the process has potential to perform within spec, the low Ppk indicates that overall, the process is not performing as well as it ideally should. I therefore would take a closer look at the shift and drift between your subgroups over time. Also, I would double-check process stability using a control chart to make sure the process is in-control.
Time: Tuesday, November 12, 2013
Orlando -- you can download the data set used in this blog post here: http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/f48e0364d0307b1f9e301d5c855e8ebc/cpk_vs_ppk_data.MTW
Time: Friday, November 22, 2013
Just wanted to say that I have referred many people to this explanation. And my favorite part is that I can google it via "Cpk Ppk Wham". Thanks again for this post!
Time: Wednesday, December 4, 2013
Great article Michelle!
To summarize my understanding of requirements for Cpk and Ppk:
Cpk requires a stable process and data must be taken in chronological order (cannot be randomly selected at the end of the day from a large batch).
Ppk does not require a stable process since it's a snapshot in time.
Regarding stability, does a minimum of 100 samples need to be recorded to meet this prerequisite or can one get away with 30 samples?
Time: Wednesday, December 18, 2013
Edgar, great questions. Even for Ppk, the process should be stable. If the process isn't stable, then we can't be sure that the capability of the process today will reflect the capability of the process tomorrow. It's also good practice to record your data in chronological order. If your data are not in chronological order, Assistant > Capability Analysis includes a 'Snapshot' option. Regarding sample size, the Assistant guidelines recommend that you collect 100 total data points. I hope this information is helpful.
Time: Sunday, December 22, 2013
I was reading "Statistical Quality Control" by Douglas Montgomery, 6th Edn. In chapter 8 it states that Ppk is calculated when process is not in control and refers AIAG & ANSI standard. ASTM E2281-08a also refers to 'Process Capability Index' & 'Process Performance Index'. Your post explains it little differently. I am bit confused.
Also, how is " sigma Within" calculated? Is it calculated from R-Bar/d2 and for Ppk value sigma overall standard deviation is calculated using SS/df?
Time: Thursday, January 2, 2014
Pradeep, I have the 5th edition which states on p349 that "if the process is NOT in control, the indices Pp and Ppk have no meaningful interpretation relative to process capability...Unless the process is stable (in control), no index is going to carry useful predictive information about process capability or convey any information about future performance." Perhaps this content can be found in the 6th edition as well?
The default estimation method for "sigma within" is the pooled stdev. "Sigma overall" is equal to the typical standard deviation formula. For more details, see p7 of http://it.minitab.com/support/documentation/answers/CapaNormalFormulasCapaStats.pdf
Time: Friday, January 10, 2014
Hi Michelle,
I would like to understand the impact when using one data subgroup or when using more, and which is the best to ues?
Time: Tuesday, January 14, 2014
Can I use a non-normal distribution with a better Ppk even when the normality test passed, but your Cpk is not meeting the requirements of 1.25 or 1.33?
Time: Wednesday, January 15, 2014
Is the data set available in excel; I can't open the MTW file with Minitabs 15
Time: Monday, January 20, 2014
Rachel, per the guidelines in the Minitab Assistant "collect data in rational subgroups when possible". This allows you to estimate the natural or inherent variation of the process. The good news is that when this is not possible and your subgroup size is 1, you can still assess the capability of the process.
Fernando, does the non-normal distribution provide a good fit for the data? Or, is the data truely normal? I would use whatever distribution fits your data BEST as this will provide you with the BEST estimate of process capability.
Time: Tuesday, January 21, 2014
Greg, here is the data. I hope this format works for you:
Example 1
601.6 600.4 598.4 600.0 596.8 600.8 600.8 600.6 600.2
602.4 598.4 599.6 603.4 600.6 598.4 598.2 602.0 599.4
599.4 600.8 600.8 598.6 600.0 600.4 600.8 600.8 597.2
600.4 599.8 596.4 600.4 598.2 598.6 599.6 599.0 598.2
599.4 599.4 600.2 599.0 599.4 598.0 597.6 598.0 597.6
601.2 599.0 600.4 600.6 599.0
Example 2
596.4 596.8 597.2 597.6 598.0 600.2 600.4 600.4 600.4
600.4 597.6 598.0 598.2 598.2 598.4 600.8 601.6 602.0
602.4 603.4 598.2 598.4 598.4 598.6 599.0 600.4 600.6
600.6 600.6 600.8 598.6 599.0 599.0 599.0 599.4 600.8
600.8 600.8 600.8 601.2 599.4 599.4 599.4 599.4 599.6
599.6 599.8 600.0 600.0 600.2
Time: Friday, February 21, 2014
Excellent explanation on the difference between Cpk and Ppk.Can you calculate PPM from Ppk? Thanks
Time: Friday, February 28, 2014
I think I understand the difference between the two a little better now. Let's see if I I got it right!?
Currently our molded parts are sampled every 6 hrs. There is no clear statistical rationale for this frequency but by doing so, we sample at least once every shift and we can also defend it by other downstream controls we have in place. We want to reduce the sampling frequency from every 6 hrs to 12 hrs.
Can I defend this by comparing CpK between two subgroups- (1) data collected at 6hr frequency intervals and (2) data collected at 12hr frequency intervals?
Time: Tuesday, March 4, 2014
Terry, thank you for your feedback. Although you can calculate PPM directly from Z.Bench (Calc > Probability Distributions > Normal), I don't know of a way to calculate PPM from Ppk. In general, PPM of ~1350 equates to Ppk of 1 and PPM of ~3.5 equates to Ppk=1.5.
Tushar, that seems like a reasonable approach. You could also use a control chart to show that the process was stable during the transition from the 6hr-to-12hr frequency intervals.
Time: Thursday, April 3, 2014
This is a very good article on Cpk and Ppk.
I have a question on the Cpk value. Is it possible to have a very large Cpk value? I ran my data in minitab and getting extremely high Cpk of 237.44. My target is 0, USL=10, LSL=none, s/s =33, 32 have a reading of 0 and 1 have a reading of .08. std dev=0.0140354. My thoughts is that the Cpk is soo high because the USL is too lose and almost of my samples falls on the target which is 0.
Appreciate any further info on this.. Thanks..Lorna
Time: Tuesday, April 8, 2014
Lorna, I'm glad you found the article helpful. If your USL is 10, then it looks like your process is quite capable. However, there may be concern about the distribution being used to compute that Cpk value. With nearly all of your measurements at 0, how did you go about choosing the distribution for your capability analysis?
Time: Thursday, May 8, 2014
Great explanation!!!!
Time: Tuesday, July 22, 2014
Hi Michelle,
Thanks for a great article!!!
Regarding your statement: "If you do not know in what order the data were collected, I highly recommend using Assistant > Capability Analysis > Capability Analysis > Snapshot. Minitab will then provide you with only the statistics (e.g. Ppk) that are applicable."
I have played around with minitab and I cannot see how the Ppk value changing when I change the order within the data set. To my question, can't I trust the Ppk value given by a probability plot i.e. through Stat > Quality Tools > Capability Analysis > Normal, if I don't know what order the data were collected?
In my trials, the Ppk value became the same in the probability plot as in the capability snapshot.
I would really appreciate if you clarified this.
Thanks!
Time: Wednesday, July 23, 2014
Great question! Ppk will be same regardless of the order of your data because the overall stdev used to calculate it does not account for subgroups.
Ppk is valid whether or not your data are in chronolgical order.
However, one of the important assumptions for process capability is that the process is stable. And you can only assess process stability with a control chart created using data that is in chronological order. The order of your data directly impacts what a control chart will look like.
I hope this is helpful.
Time: Thursday, July 24, 2014
Hi again Michelle,
Thank you for your answer about Ppk!
I assume you mean we need to know what chronological order our products exit the production process, prior testing, in order to get a reliable control chart?
The reason I asked my former question is because I am testing a product in order to determine if the functionality fulfil the predetermined specification limits (set by our customer).
The way we do this is to first calculate the P-value, and in case the data are NOT normally distributed (p less than 0.05), we calculate Ppk. If Ppk is 1-2 we check for outliers. If Grubbs test gives us no outliers, then we want to assess if the data is in control, or if we can expect out of spec values in the future.
I can add that we do not know the chronological order our products were produced the production process, we only get batches with products in "random" order (but we expect the products in the same batch are similar).
Can you recommend a way to assess if the data is in control? I.e. when we determine the data as pass/fail according to our acceptance criteria.
(I would appreciate if you could give details of what chart/graph to use).
Best regards,
A person in need of your expertice