In Part I and Part II we learned about the experiment and the survey, respectively. Now we turn our attention to the results...

Our first two participants, Danielle and Sheryl, enter the conference room and are given blindfolds as we explain how the experiment will proceed.  As we administer the tasting, the colors of the wine are obvious but we don't know the true types, which have been masked as "A," "B," "C," and "D."

As Danielle and Sheryl proceed through each tasting, it is easy to note that they start off correctly identifying the color of each wine; it is also obvious that tasting methods differ greatly from person to person as Danielle tends to take one or two sips whereas Sheryl is drinking the entire sample.

As Sheryl has her fourth sample, we realize that her fifth sample will be the exact same wine. She makes her guess of "Pinot Noir, Red" for the fourth sample and is given the fifth, which she almost immediately takes a sip of. But the immediate response we were expecting doesn't come. Then a second sip. Still no guess. Then more. As she finishes the sample, she finally reports that it is "Cabernet Sauvignon, Red." All fears we had that perhaps the experiment would be too easy are quickly dissipated, as we have just witnessed someone make different guesses for the same wine given back-to-back. Sheryl's self-reported knowledge level is a 6.

Before jumping into how often tasters got color and type correct, it's worth looking at how often type and color were mismatched by each taster—in other words, if a taster guessed "Red, Riesling" they mismatched the two because Riesling is a white wine:

For the most part, mismatches weren't an issue for our tasters. However, Viviana in particular struggled, and a look at her results shows that she believed Pinot Noir to be a white wine and Sauvignon Blanc to be red. At one point during the experiment Viviana, a fluent Spanish speaker, even questions out loud if "blanc" from "Sauvignon Blanc" might be related to "blanco" (Spanish for "white"), but decides against it.

Our first look at how well participants did will use Minitab's Attribute Agreement Analysis, first for color only:

One thing to note about Attribute Agreement Analysis is that without testing huge numbers of unique parts, it is very difficult to distinguish between operators in any statistically significant manner. So most conclusions we draw are based on the system itself, grouping operators together into one big system. Here we see that our participants did generally well within themselves (how consistently they answered, whether right or wrong) as well as against the known standard (how consistently they answered correctly).  In only two cases—Jeremiah and Santos (who had the lowest wine knowledge scores)—did the participant show higher consistency within than against the standard.

Next we'll look at the same graph, but by type instead of color:

Here we see much less consistency "within," meaning our tasters had a harder time consistently guessing the same type when they had been given the same wine a second time. Further, once we compare to the standard, there was little consistency, with nearly half of the participants failing to get both samples of the same wine correct even once!

One extreme example is Bobby, who answered consistently 100% of the time but got none of the wines correct. From a practical standpoint, Bobby's performance is usually fairly easy to correct as he already possesses the ability to distinguish parts but is putting them in the wrong "buckets."

One drawback to traditional Attribute Agreement Analysis is that incorrectly appraising a part on even one replicate is treated as though that entire part was appraised incorrectly. In other words, suppose you got 7 out of the 8 tastings correct: despite being right 87.5% of the time, you are scored as being right 75% of the time (3 out of 4 wines correct). So another way to look at our results would be to see how often they were correct on both color and type out of the eight samples:

Here we more clearly see that Brutus and Katherine turned in the best performances, with 6 out of 8 tastings correctly identified. Bobby, as discussed above, missed all of them but showed great consistency (as did Danielle).

Aside from looking at how each appraiser did, we can evaluate our results in a couple of other ways. One is to look at whether certain wine types were identified correctly more or less often than others:

There is little if any evidence that certain types were easier or more difficult to identify, with a Chi-Square test resulting in a non-significant p-value of 0.911.

Similarly, it would seem reasonable that as a participant had tasted multiple samples of wine, they would become less able to discern differences and performance would degrade. So we also looked at whether the number correct decreased as participants progressed through the eight samples:

Again there is no visual or statistical evidence of such an effect. It could be that with more tastings the effect would have shown up, or it could be that the effect is countered by participants' memory of what they had answered previously and what to still expect in the remaining samples.

To summarize:

• The regular wine drinkers who participated showed only limited ability to identify which type of wine they were drinking when blindfolded.
• Color was considerably easier to identify than type.
• Multiple participants showed a much greater ability to distinguish wines than to correctly identify them.

Keep in mind that our participants represented a diverse group of wine drinkers, so stay tuned for Part IV, when we evaluate whether our survey results from Part II correlate in any meaningful way to these experimental results...