dcsimg
 

Residual Plots Revelations

Eureka moment We recently had a technical support request about using the Response Optimizer after analysing a factorial design in Minitab 16. Whilst we were able to help illustrate its use and answer the customer query, something else about the analysis caught our eye. It was the residual plots that showed the unusual effects. 

Residuals are used in regression and ANOVA analyses to indicate how well your model fits the data. Examining residual plots helps you determine if the ordinary least squares assumptions are being met. If these assumptions are satisfied, then ordinary least squares regression will produce unbiased coefficient estimates with the minimum variance.

Therefore, it is strongly recommended to always verify the following assumptions about the errors in your data, before accepting the results of a DOE :

  • The errors are independent (and thus random)
  • The errors do not deviate substantially from a normal distribution
  • The errors have constant variance across all factor levels.

Thankfully, Minitab 16 provides tools to verify these assumptions: The Four in One residual plots (Stat > DOE > Factorial > Analyze Factorial Design > Graphs).

Analyze factorial design -Graphs
  
As mentioned in my previous post, probability plots can reveal a lot of interesting things about the data. This is especially true when looking at the normal probability plot of the residuals. 

Here they are below, with points of interest highlighted in green by brushing. This reveals them on the Residuals Versus the Fitted Values and Residuals Versus the Order of the Data  charts as well.

Residual plots for Response

First points to notice are not just that the brushed points are a long way from the rest of the data; they are in vertical lines.  Vertical lines of dots on the normal probability plot indicates data with the same value. 

Now look at the Residuals versus the Order of the Data graph in the lower left. Those green points show a row of four at the same value.  Notice that this occurs in two other places on the chart. On the Residuals versus fitted values chart, those eight green dots look like two, again indicating that you have duplicated values. Interesting...there has to be a story behind this data!

Here is the revelation:

This analysis was from a custom design that was imported into Minitab using Stat > DOE > Factorial > Define Custom Factorial Design. Here is a snippet of the design worksheet (it is in coded values and the names are masked).

Minitab Worksheet

Column C5 to Column C10 are the factors, and the data are displayed in the run order of the worksheet. The centre points have all been run one after the other, but what stands out is that centre points that are together have exactly the same value. 

What happened? It turns out that operators running the design had copied and pasted the first result of a block of four centre points to the others.  The design initially had 16 centre points, four groups of four, but due to copy-and-pasting there are only four unique centre points in the design.

What does this do to the analysis? Well, with duplicated values for these centre points, the output would reflect replicated points with 0 variation between them. This would cause the estimate of variation in the error to be lower than it really is. This in turn would make the other factors, when compared to the low estimate of the Mean Square error, to appear to be more significant than they really are.

The solution to this? Simply remove the extra duplicated points and re-run the analysis. This is another great example of the interesting patterns that can be found in residual plots and how revealing they can be ! 

 

Have you ever found something surprising by looking at residual plots?

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >

Comments

Name: Pia • Wednesday, November 7, 2012

Hi,
I hope you could answer a question for me, regarding a residualplot with residuals (y-axis) vs. fittet values (x-axis). My problem is that there are several vertical lines in the plot. What does that mean?


blog comments powered by Disqus