How Olympic Judging Stacks Up, Part II

Minitab Blog Editor 14 August, 2012

As a follow-up to my recent article on how judges did in evaluating performances at two previous Olympic events, I wrote a blog post recently comparing the events from the 2008 Olympics and whether each demonstrated judging bias and if so, how much.

But regardless of whether bias exists, the real purpose of judging these events is to determine the best performances and specifically to award gold, silver, and bronze medals. So now I want to take a look at those same events and whether or not each is adequately determining the medal winners…

If you read the original article, you know that General Linear Model (GLM) was used to do each analysis. When using GLM, you can get what are known as multiple comparisons for categorical factors and even “grouping information” on which levels of those factors can be differentiated from one another using our data. So by getting multiple comparisons on the athletes, we can tell whether the gold medalist was adequately differentiated from the silver medalist, the silver from the bronze, the bronze from fourth place, etc.

Here are the results, explained below and grouped by medal color to indicate which finishers could not be distinguished:

There is a large amount of information to provide in a single graph, so I’ll explain how to read it using the last event listed, Equestrian Individual Dressage GP Special, as an example. The gold bar extends from the 1st place finisher through the 3rd place finisher, meaning that while in real life the 1st place finisher was awarded a gold medal, statistically you cannot differentiate that person’s score from the individuals who finished 2nd and 3rd.  The same would go for silver: you cannot differentiate the 1st and 3rd place riders from the 2nd place rider.  Finally, the 3rd place rider cannot be differentiated from the 4th through 9th place riders.

This presents a common conundrum in interpreting multiple comparison results, where the logic tends to go something like “If I can’t tell 1 from 3, and I can’t tell 3 from 9, then I can’t tell 1 from 9.”  But remember, these comparisons are a statistical test, and saying we can’t tell first and third apart does not mean we are saying they are the same – we are simply saying we don’t have evidence they are different.

Back to the results. I’ll save you some reading and give you what I see as highlights:

  • There is not a single event that adequately separated 1st from 2nd, 2nd from 3rd, and 3rd from 4th, so no event demonstrated that the judges chose the correct medal winners.
  • Only 4 events (three of them diving) out of 21 distinguished the gold medal finisher.
  • In 15 of the 21 events, at least one athlete did not receive a medal but was statistically indistinguishable from at least one of the medalists. In one equestrian event, finishers all of the way through eleventh place were not distinguished from the third-place rider.
  • Synchronized Swimming and Diving do the best job of distinguishing medalists.
  • Gymnastics and Trampoline do the worst job of distinguishing medalists.

So when you’re watching the Olympics and your favorite athlete wins silver, it’s likely fair to say “He won silver, but statistically he probably wasn’t any worse than gold!”