Tour de France: Statistics Reveal the Drama

57 Seconds. After more than 2,000 miles and nearly three weeks of grueling cycling, Cadel Evans needed 57 seconds to catch the leader.  And he would have to do it riding alone for only 26.4 miles.

He gained two and a half minutes.

When you watch the Tour de France, you realize that amid the extreme physical challenges of the race comes a high level of personal drama. And while most people picture a large pack (the peloton) of virtually every rider in the tour riding together for most of the race, you might be surprised at just how much separation happens during each stage of the race.

Stages are categorized as follows:

• En Ligne - This is a French term that in English we would call "flat".
• High Mountains - Long, steep climbs and extremely fast descents (riders often top 60 mph).
• Individual time-trial - Riders start separated by a couple of minutes and must ride the stage alone.
• Medium Mountains - Not as big as the high mountains, but still mountains that 99% of people would never think of riding a bike up or down.
• Team time-trial - Teams start separately and ride separately, and all members of the team are given the same time.

Rather than looking at the total time, results are presented in "gaps", which is the time between the winning time and their own.  So the winner of a stage has a gap of 0:00; the second place rider would have a gap of 0:07 if he crossed the finish seven seconds later, etc.  If riders finish in a pack, they are all given the same time. Based on the image of nearly everyone riding together, you would expect a huge group to have the same time on a stage with a few finishers faster or slower, right?

Here is a look at the distribution of gaps for each stage in the 2011 Tour De France, colored based on the stage type:

The graph reveals that riders' finishes are much more spread out than it would seem, and it appears that mountain stages present the largest time gaps.  Another factor that could affect the time gaps would be the length of the stage, so I plotted the average time gap versus the stage length and kept the grouping:

From that graph it appears that the stage type is much more important in determning the average time gap than the distance.  In fact, the following ANOVA table confirms that stage distance is not significant while stage type is highly significant:

Removing the distance term shows that about 82% of the variation in average time gaps is explained by the stage type.

Another aspect of the race is that many riders are unable to finish, whether due to injury, fatigue, or doping violations (which are less common now than a few years ago, but still happen).  Here are the number of riders that dropped out of the race at each stage, with stage type still indicated:

The outlier is from a day where a single crash caused injuries to four riders that were significant enough for them to quit (in the past, riders have been known to complete the race even after breaking bones!).

Back to our friend Cadel Evans, who won that final time trial over the former leader by two and a half minutes...based on the graphs above, you now have some context for what a great ride that was. The individual time trial had, on average, a five-and-a-half-minute gap on the winner of that stage. So beating another rider by two-and-a-half minutes may not seem incredible, but remember that Cadel didn't need to beat just an average rider—he needed to beat the race favorite and current leader with only that stage remaining!

Here is a profile of the top 10 finishers in the race, and their overall time gaps (meaning how far back they were cumulatively across all stages to that point, not just for that stage) from the leader among that group:

That's Cadel in blue, and you can see him drop into first place at Stage 20, the individual time trial, after his performance (the last stage of the race is largely symbolic and other than a sprinting competition does not change times...Cadel had won the race after the time trial).

Name: alexei • Saturday, July 7, 2012

Ok, those are nice statistics, that present well the entire picture of the race.

After reading the article, I still do not understand what is the drama?
May somebody explain?

"And while most people picture a large pack (the peloton) of virtually every rider in the tour riding together for most of the race"???

A pack does exist on a flat road, but as soon as there are some difficulties like an uphill finish, or some points to grab, there are accelerations drawn by individual riders and those who are able to fallow, the other are left behind. The difficulties are spreading the gaps.

Who was already seen a pack crossing the finish line after a high mountain stage? I personally never did! One just need to watch a finish after a high mountain stage, to see that the finish lines is crossed by individual riders than an actual pack.

“The graph reveals that riders' finishes are much more spread out than it would seem, and it appears that mountain stages present the largest time gaps”.

If one has already ridden bike it is unnecessary to read my explanation, but one who didn’t should just think a sec about physical differences between riders, one will understand that till a pack exists (flat road) it’s fine for the weakest riders to hide in the “windschatten”, but when the road climbs, the average speed drops down hiding from air resistance is still possible, but you also need to fight your own weight, so I that case the race becomes more individual and gaps appears.

Name: Joel • Monday, July 9, 2012

Alexei-

Thank you for the extensive comments! To start, the information was presented for those with only a superficial or basic knowledge of cycling and the Tour de France in particular. As an experienced cyclist and avid fan of the race, the breaking apart of the field in mountains is very familiar to me as we saw on Saturday with just a single mountain at the end of the stage.

The drama comes from Evans having to overcome his time gap on the final competitive stage, and that being an individual time trial that generally did not have large variation in completion times. Considering he needed to beat another elite cyclist by nearly a minute and did so by two and a half, it was a dramatic performance.

As you know all network must use the video feed provided by the race organizers. On flat stages we are essentially presented with X riders on an extended breakaway, with the peloton usually catching them just before the end of the stage and then sprinters leading out to finish just ahead of the peloton. From this coverage, the casual fan would expect the time gaps on flat stages to show a few riders right around zero, an then almost every other rider in a tight cluster just behind that (maybe 10-15 seconds). But as can be seen in the first graph, the finishers are spread out quite a bit more than that even on flat stages. The second graph even shows that the average gap on flat stages is roughly two minutes, which would surprise most TV viewers unless they really read through all of the finishing times for each stage.

Thanks again for your comments - I'm off to go hide in the middle of the peloton until I get dropped in the mountains!

Name: tamoghna • Wednesday, July 11, 2012

Hi Joel, I thoroughly enjoyed your this article !! It will be really great to follow along. Would you mind to share the data? And if possible also the data on another great article " Olympic judging".

Name: carl • Wednesday, July 18, 2012

Wow! That's a lot of geek speak to say schleck stinks as a time trialist LOL.

A lot of the gap is explained by team tactics as well as rider specialization.

Seriously though I like the real world examples.