Have you ever worked on a quality project where a team member—or maybe you—had a hunch that seemed to be true, one that felt so right that it seemed silly to even verify? Here's a good example of why you should always look at the data before you reach a conclusion.
I recently performed a data analysis on statistics that showed that NFL home teams were winning more often on Thursday nights than the average NFL home team. They were winning by a larger margin, too. However, neither of these differences were statistically significant.
The problem might have been that our sample size was too small. So I’ve collected data on Thursday games going back to 1970 to see whether we can find a statistical difference.
Home teams win 57% of the time in the NFL. We previously saw that since 2006, home teams that played on Thursday night won 63%. What happens to that average when we include every Thursday game played since 1970?
The proportion of games won by the home team drops to 58.9%, which is much closer to the average of all NFL home teams. The p-value is .364, which is not low enough to conclude that this is significantly higher than 57%. When we increased the sample size, we see that Thursday home teams are winning just about the same amount as everybody else. But what about their margin of victory?
Margin of Victory
Home teams win by an average of 2.5 points in the NFL. The previous statistics showed the Thursday home teams were winning by 4.24 points. Will that margin stay the same when we increase the sample size?
The average margin of victory dropped to 3 points. The p-value is greater than 0.05, so we cannot conclude that the average of 3 points is statistically greater than 2.5.
Does Era Matter?
There is one last thing we can look at. I increased the sample size by taking games dating back to the 70s and 80s. But the game was a lot slower and less physical back then. Perhaps in the 70s, players weren’t as beat up after a game, and thus having to travel on a short week and play Thursday didn’t put the away team at as much of a disadvantage as it seems to today. I broke the data down into categories of 5 years, and made time series plots to see if there is any kind of a trend throughout the eras.
Neither plot shows any trends in the winning percentage or margin of victory of home teams playing on Thursday. In fact, the largest values of each occurred from 1980 to 1984.
So increasing the sample size made the statistics for home teams playing on Thursday look just like the statistics for Sunday. It appears that home teams aren’t getting any greater home field advantage. Perhaps when the NFL goes to a 32-game schedule and plays a game every day of the week (I’m only half joking), we’ll be able to find a situation where the home team has a greater home field advantage. But until then, the home field advantage appears to be the same no matter what day of the week it is.
This provides a good example of why practitioners should use statistics to see if their hunches, based on two means in this case, are anything to write home about. Whenever you think you “see” something that appears to go against the norm, make sure you have some data analysis to back up your claim!