I’ve been hiding empty beer bottles in my office lately. I hope no one finds them. My boss is in the next office and he's already asked about the occasional "clinking sounds.” I told him I’m practicing castanets for the Minitab talent show.
But it's not what you think. Inspired by the relationship between beer and statistics, I'm conducting an attribute agreement analysis of beer appraisals. Sometimes, an ice-cold, concrete example is the best way to experience the ins and outs of a statistical analysis.
Suppose that Minitab is a brewery and we're releasing a premium beer. The brewery employs appraisers to taste random samples of beer from production and identify any samples that are “defective”—maybe they come from a bad batch, or maybe there’s an error in the bottling or storage process.
This inspection process is an example of a go/no-go gage, where appraisers inspect each item and give it a “pass” or “fail” rating. To evaluate this measuring system, follow a few basic steps to set up your study.
In a real study, you’d randomly select appraisers to make sure they accurately represent your entire measuring system. Alas, there’s not a pool of professional beer tasters here at Minitab. Luckily, four of my colleagues from the Information Development department generously volunteered their time and taste buds to sip beer samples and serve as our appraisers.
Their photos and names have been slightly altered to protect their privacy on this online statistical reality show.
For this study, the appraisals need to be based solely on the quality of the beer (defective or not) rather than a personal preference for a certain type of beer.
All the appraisers thus agreed on a premium light lager (Stella Artois) as the standard for “good.” Each sample was about 1 oz of Stella poured into a small plastic Dixie cup.
For the standard for “bad,” I deviously concocted two types of defective samples using the same beer.
All things equal, using more items and trials generally produces more conclusive results. To evaluate the self-consistency of each appraiser, each appraiser should rate each item more than once.
My study has some serious limitations:
So we could perform only two trials using four 1-oz samples (2 “good” and 2 “bad”) in each trial. Will the small size of this study come back to haunt us?
Appraisers should rate items in random order to minimize bias. The easiest way to set up a data collection plan with randomized runs for a go/no-go gage study is to use the Assistant. In Minitab, choose Assistant > Measurement Systems Analysis > Attribute Agreement Worksheet.
Enter the appraisers, number of trials, and the "good" and "bad" items you’ll use in your study. The Assistant will automatically set up a worksheet with randomized runs:
Now the study is ready to go. Each appraiser will taste 8 samples of beer in random order and give their thumbs-up or thumbs-down to each sample. What will happen?
Tune in next time for the results...