Tom Brady and the Danger of Selective Endpoints

Last Friday I had an interesting tweet come across my Twitter feed.

And that was before the Patriots failed to cover their first playoff game of 2015 against the Ravens. When you include that, the record becomes 3-11, good for a winning percentage of only 21%! With the Patriots set to play another playoff game against the Colts, it seems like the smart thing to do is to bet the Colts to cover. But wait, 14 games is a pretty small sample. We should do a hypothesis test to determine whether this percentage is significantly less than 50%.

Minitab Statistical Software returns a p-value of 0.029, which is less than the alpha value of 0.05, so we can be 95% confident that the true percentage of games that the Patriots cover during the playoffs is less than 50%. Great! Now it’s time to get my ATM card and bet a mortgage payment on the Colts. Thank you, statistics!

But wait, there is one more question I should probably ask pertaining to that tweet.

Why only the last 13 games?

 Date Patriots Opponent Spread Score Cover the Spread? 1/19/2014 @ Denver +5 L 16-26 L 1/11/2014 Indianapolis -7.5 W 43-22 W 1/20/2013 Baltimore -8 L 13-28 L 1/13/2013 Houston -9.5 W 41-28 W 2/5/2012 New York Giants -3 L 17-21 L 1/22/2012 Baltimore -7 W 23-20 L 1/14/2012 Denver -14 W 45-10 W 1/16/2011 New York Jets -9.5 L 21-28 L 1/10/2010 Baltimore -3.5 L 14-33 L 2/3/2008 New York Giants -12.5 L 14-17 L 1/20/2008 San Diego -14 W 21-12 L 1/12/2008 Jacksonville -13.5 W 31-20 L 1/21/2007 Indianapolis +3.5 L 34-38 L 1/14/2007 San Diego +5 W 24-21 W 1/7/2007 New York Jets -9.5 W 37-16 W

Here are the last fifteen games the Patriots played prior to the tweet (so, not including the most recent Baltimore game). I’ve highlighted the 13th, 14th, and 15th games in red. All three of these games were played 1 week apart, but the 13th game was included in the tweet, while the 14th and 15th games were conveniently left off.

Why? Because 3-10 against the spread sounds more impressive than 5-10.

This is using selective endpoints to manipulate statistics to help prove your point. It’s these kind of things that lead people to say “There are three kinds of lies: lies, damned lies, and statistics.” The conclusions you can make from your statistical analysis are only as good as the data behind it is. That’s why you should always make sure you collect a random, unbiased, sample. And before you believe the conclusions made by others, ensure they collected the data correctly too!

In our Patriots situation, we could go back and look at every playoff game the Patriots have played in. But I don’t think their games in 1963 have any effect on their games this season. So instead, the best thing to do is to associate this Patriots team with Tom Brady. So we should sample all the playoff games that Tom Brady has played in. That includes the 16 previous games (in which he went 5-11 against the spread) and 11 games he played before 2007 (in which he went 6-4-1). This gives us a final record of 11-15-1, which is a winning percentage of 42%.

Once we obtained a legitimate sample of data, we see that Tom Brady and the Patriots record against the spread in the playoffs isn’t nearly as bad as we were originally led to believe. While 42% is still less than 50%, it is no longer significantly different.

So could the Patriots still fail to cover against the Colts this weekend? Of course. But I'm not going to go bet a mortgage payment on it.

Photo of Tom Brady by Keith Allison, used under Creative Commons 2.0.