Using Regression Analysis to Predict Fantasy Football Scores

Part turkey, part duck, part chicken. The Turducken!!!!!!

"I could even predict that John Madden will be the next president and make the Turducken our new national bird!" Yes, but what do the data say? 

Anybody can predict the future. It’s true. Whether it’s predicting whether a coin will be heads or tails or a quality improvement expert predicting ways to save money, anybody can do it. I could even predict that John Madden will be the next president and make the Turducken our new national bird!

But there is a catch: you have to be able to do it accurately. Without knowing how accurate your prediction will be, it doesn’t mean very much. And that’s where Minitab’s regression analysis comes in. Using regression analysis, I’m going to see how accurately I can predict an NFL player’s fantasy football score for the 2011 season.

How can we tell how accurate a model is? The regression analysis gives us a statistic called the R-squared value (R-Sq), which is a percentage between 0 and 100. The closer the value is to 100%, the more accurate your model. Last year, I created a model that predicted players' fantasy scores at the end of the season based on their scores from the first 5 weeks. My model last year had an R-Sq value of 81%, which is pretty good. Let’s see what a similar model would look like if I use data from past seasons to try and predict how a player will perform this year.

I collected data from 110 quarterbacks, running backs, and wide receivers. To account for the variation of players getting injured and switching teams, the player had to have been on the same team for 3 years and played a majority of the games during each season. I then used regression analysis to see if I could predict how a player would do in his current season based on his fantasy average the previous two seasons.

Regression Analysis

The R-Sq value is 61%, so our model explains only 61% of the variation in the number of fantasy points the players finish with in the current year. That means there is a sizeable amount of variation (39%) the model can’t explain. For example, in 2009 and 2008 Darren McFadden averaged 4.9 and 7.7 fantasy points per game respectively. Then in 2010 he blew up and averaged 16.5 fantasy points per game. There are just some things that the stats can’t explain.

Things get worse if you only include one season of data. Leaving the variable “Yr2” out of the model drops the R-Sq (adj) value to 53% (we have to use R-Sq (adj) to compare models with different numbers of predictors). Some players, like Arian Foster, only have one year of good data. Our model doesn't look very accurate, but is there anything we can conclude?

Yes there is. We know that with only a few weeks of data, we can be much more accurate in our predictions. So early in the season, when everybody else still has preconceived notions based on pre-season rankings, you should be dealing the busts and picking up the sleepers. For example, last year a Minitab employee traded Wes Welker for Peyton Hillis in week 6 based on predictions from the model. From week 6 to week 16, Hillis outscored Welker by 71 points.

But maybe we can still salvage something from this data. Next, I’ll split the data by position and see if I can get a model that can describe any of the individual positions a little better.

Photograph by Phil Roman licensed under Creative Commons Attribution-NonCommercial-NoDervis 2.0 Generic License.


7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >


Name: Matt Vonada • Wednesday, August 24, 2011

I had the same sort of idea, but was starting to go about it using expert systems (fuzzy logic, specifically). There are a lot of variables it would have been tough for your regression analysis to utilize (e.g. did a WR (Fitzgerald) get a new QB (Kolb)? have a serious injury that may set him back? get to an age that running backs tend to decline?). The system I was working on last year led me away from some seemingly high-value targets because of simple things like that, but generally it is a hard problem because of the incredible number of variables.

Name: Kevin Rudy • Friday, August 26, 2011


I agree that there a lot of other variables that could play a part. I definitely think that WR depends on the QB that throws them the ball (Fitzgerald last year, Randy Moss when we went from Oakland to New England). The problem is it's hard to predict. I know Kolb will affect Fitzgerald's numbers this year, but in what way? Will he be better, the same, or worse than the Arizona QBs last year? Because Kolb has hardly played, I have no idea! (although I doubt he can be worse)

But I was just trying to create a simple model here because the simple one I did last year worked quite well. I did look at a few other variables when I broke the data up by position (number of carries the year before for RB, number of targets the year before for WR), but nothing was significant except fantasy scores. I think there are definitely individual cases where you can explain scores with other variables, but it's just hard to apply them to the every player.

blog comments powered by Disqus