Mind the gap. It's is an important concept to bear in mind whilst traveling on the Tube in London, the T in Boston, the Metro in Washington, D.C., etc. But how many of us remember to mind the gap when we create an interval plot in Minitab Statistical Software? Not too many of us, I'd wager. And it's a shame, too.
When you travel on the subway, minding the gap means giving thoughtful consideration to the space between the platform the and the train. On the subway, minding the gap can make the difference between these two very different views of the subway station:
When you make an interval plot in Minitab, minding the gap means giving thoughtful consideration to the space between groups on the x-axis. For interval plots, minding the gap can make the difference between these two very different views of your data:
Allow me to demonstrate with an example. If you like, you can download the data file, PercentMoisture.MTW from our data set library and follow along. (You can get the free 30-day trial of Minitab here if you don't already have the software.) Technicians at a food company collected these data to try to figure out the best combination of time and temperature to bake cereal grains to minimize their moisture content.
Interval plots are useful because they summarize your data and allow you to simultaneously compare the means (represented by the points or symbols) and the variability (represented by the interval bars) for each sample or group. (To see more interval plots in action, check out these other blog posts: Seven Alternatives to Pie Charts and When Even Cupid Isn't Accurate Enough.)
Creating a basic interval plot in Minitab is simple. Just select Graph > Interval Plot. Then choose the One Y, With Groups option, enter the data as follows, and click OK. (For the sake of space in this article, I renamed the columns "Time" and "Temp".)
The nice thing about interval plots is that multiple levels of multiple factors can be represented by different positions on the x-axis. But the unfortunate thing about interval plots is that multiple levels of multiple factors are represented by different positions on the x-axis.
All the information is there, but it's hard to see how one group relates to the next. For example, to compare the results for the 130-degree oven temperature across the different oven times, you need to compare the 2nd interval bar to the 5th interval bar and the 8th interval bar. You end up going from one similar-looking bar to another and another, and that seldom ends well.
To make the different oven temperatures stand out more, you can add a little color. Just double-click one of the symbols to open the Edit Mean Symbols dialog box. Click the Groups tab, enter the temperature variable, and click OK.
To help make the grouping even clearer, you can connect the dots. Right-click the graph and choose Add > Data Display, then select Mean connect line and click OK.
Now it's much easier to identify and compare the results for the different oven temperatures. But here is where we really start to mind that gap. By which I mean that we start to give thoughtful consideration to the space between the oven-time groups on the x-axis. And by which I also mean that we mind these gaps because they are annoying and we want them to go away. But we need not worry, because that's one gap we can shrink easily.
Double-click the x-axis to open the Edit Scale dialog box. Notice the Gap within clusters setting. A setting of –1 means that the intervals for all levels of oven temperature at each level of oven time will be at the same location on the x-axis. Change the setting to –1 and the gap is closed.
And while we're at it, let's make the tick labels for temperature go away as well because they are redundant with the legend, and because the legend conveys the same information. And because if we don't, those labels would appear on top of each other, which looks pretty weird.
Awesome! The plot looks much better without the big gaps. Although, perhaps a little gap would make it easier to see the individual intervals more clearly. If we change that gap to –0.85, then everything is groovy.
Now that's a gap I don't mind at all! Now it's really easy to compare the results for different oven temperatures within and across the different oven times. The interval plot suggests that to minimize moisture content, we want to use the 90-minute oven time, but we don't want to use the 125-degree oven temperature.
As you can see, the interval plot is an easy and fast way to get a good idea of which differences could be important. But remember, the interval plot can’t tell us which effects or which differences are statistically significant or not. For that, we need to conduct an analysis of variance (ANOVA).
Spoiler alert: I already ran an ANOVA on these data and it confirms what we gleaned from the interval plot. The main effects for both time and temperature are significant. (The interaction effect is not quite significant at the 0.05-level.) Tukey comparisons show that 90 minutes in the oven reduces moisture significantly better than either 30 minutes or 60 minutes in the oven. Tukey comparisons also show that a 125-degree oven is significantly worse at reducing moisture than either a 130-degree oven or a 135-degree oven. The effects of the 135-degree oven are not significantly different from the 130-degree oven, so we can probably save some energy and just use 130 degrees to desiccate our wild oats.