You may have read my recent article applying statistical analysis to how judges did in evaluated performances at two previous Olympic events. If so, perhaps you found yourself wondering how other events stack up…
* Data on the individual judging scores could not be found for the synchronized diving events, Men’s and Women’s Vault, Men’s and Women’s All-Around Gymnastics, or any Rhythmic Gymnastics events. I would be very interested if anyone has this data.
Before we dive in, remember that R-squared is the percentage of the Total Sum of Squares that is accounted for by the terms in the model. Or, put more loosely, it's the percentage of variation in the data explained by the model. I'll use R-squared to state what percentage of the variation in scores for an event is accounted for by judging bias by taking the Sum of Squares for any significant judging terms and dividing by the Total Sum of Squares. I’ll call that number “Judging Contribution”.
Below is a table of all events analyzed, with the following information:
Event |
Judge Contribution |
Judge p-value |
Judge*Athlete p-value |
R-sq(adj) |
Diving - Men's 10M Platform |
0.0% |
* |
* |
89.63% |
Diving - Women's 10M Platform |
0.0% |
* |
* |
94.36% |
Diving - Men's 3M Springboard |
0.9% |
0.000 |
* |
86.03% |
Diving - Women's 3M Springboard |
2.4% |
0.002 |
0.003 |
92.27% |
Gym - Men's Horizontal Bar |
0.0% |
* |
NA |
76.33% |
Gym - Women's Beam |
0.0% |
* |
NA |
91.68% |
Gym - Men's Parallel Bars |
0.0% |
* |
NA |
87.28% |
Gym - Women's Uneven Bars |
4.1% |
0.040 |
NA |
85.38% |
Gym - Men's Rings |
13.4% |
0.002 |
NA |
74.01% |
Gym - Women's Floor |
2.2% |
0.050 |
NA |
91.56% |
Gym - Men's Pommel |
0.0% |
* |
NA |
87.38% |
Gym - Men's Floor |
0.0% |
* |
NA |
92.79% |
Synch Swimming - Team TM |
0.0% |
* |
NA |
95.91% |
Synch Swimming - Team AI |
0.0% |
* |
NA |
97.66% |
Synch Swimming - Duet TM |
0.0% |
* |
NA |
94.60% |
Synch Swimming - Duet AI |
0.0% |
* |
NA |
93.02% |
Trampoline - Men's |
0.0% |
* |
NA |
80.64% |
Trampoline - Women's |
0.0% |
* |
NA |
98.39% |
Equestrian - Individual Dressage Free |
0.0% |
* |
NA |
79.68% |
Equestrian - Individual Dressage GP |
1.5% |
0.000 |
NA |
92.15% |
Equestrian - Individual Dressage GP Special |
0.0% |
* |
NA |
85.59% |
There’s plenty of information there to be consumed, but I’ll provide some high-level overviews of what I learned from these results…
So there you have it…6 out of the 21 events examined showed judging bias in the 2008 Olympics, and that bias accounted for as little as 0.9% or as much as 13.2% of the total variation in scores. Although there were certainly plenty of events that appear well-judged, there is definitely room for improvement in the 2012 London Games!