"I could even predict that John Madden will be the next president and make the Turducken our new national bird!" Yes, but what do the data say?
Anybody can predict the future. It’s true. Whether it’s predicting whether a coin will be heads or tails or a quality improvement expert predicting ways to save money, anybody can do it. I could even predict that John Madden will be the next president and make the Turducken our new national bird!
But there is a catch: you have to be able to do it accurately. Without knowing how accurate your prediction will be, it doesn’t mean very much. And that’s where Minitab’s regression analysis comes in. Using regression analysis, I’m going to see how accurately I can predict an NFL player’s fantasy football score for the 2011 season.
How can we tell how accurate a model is? The regression analysis gives us a statistic called the R-squared value (R-Sq), which is a percentage between 0 and 100. The closer the value is to 100%, the more accurate your model. Last year, I created a model that predicted players' fantasy scores at the end of the season based on their scores from the first 5 weeks. My model last year had an R-Sq value of 81%, which is pretty good. Let’s see what a similar model would look like if I use data from past seasons to try and predict how a player will perform this year.
I collected data from 110 quarterbacks, running backs, and wide receivers. To account for the variation of players getting injured and switching teams, the player had to have been on the same team for 3 years and played a majority of the games during each season. I then used regression analysis to see if I could predict how a player would do in his current season based on his fantasy average the previous two seasons.
The R-Sq value is 61%, so our model explains only 61% of the variation in the number of fantasy points the players finish with in the current year. That means there is a sizeable amount of variation (39%) the model can’t explain. For example, in 2009 and 2008 Darren McFadden averaged 4.9 and 7.7 fantasy points per game respectively. Then in 2010 he blew up and averaged 16.5 fantasy points per game. There are just some things that the stats can’t explain.
Things get worse if you only include one season of data. Leaving the variable “Yr2” out of the model drops the R-Sq (adj) value to 53% (we have to use R-Sq (adj) to compare models with different numbers of predictors). Some players, like Arian Foster, only have one year of good data. Our model doesn't look very accurate, but is there anything we can conclude?
Yes there is. We know that with only a few weeks of data, we can be much more accurate in our predictions. So early in the season, when everybody else still has preconceived notions based on pre-season rankings, you should be dealing the busts and picking up the sleepers. For example, last year a Minitab employee traded Wes Welker for Peyton Hillis in week 6 based on predictions from the model. From week 6 to week 16, Hillis outscored Welker by 71 points.
But maybe we can still salvage something from this data. Next, I’ll split the data by position and see if I can get a model that can describe any of the individual positions a little better.
Photograph by Phil Roman licensed under Creative Commons Attribution-NonCommercial-NoDervis 2.0 Generic License.