Fantasy Studs and Regression to the Mean
Kevin Rudy has recently written two great posts (here and here) about how fantasy football studs perform the following year. For any fantasy team manager, the results demonstrate how difficult it can be to predict player performance...pity the person with the first pick in a draft, who seems almost certain to not pick the best performer that year!
But why is this the case?
One cause is special circumstances such as injury...if you placed among the top fantasy performers, you were almost certainly injury-free or close to it for the entire season. So an injury the following year means you obviously will underperform relative to your standout year.
But the bigger reason is what is known as "regression to the mean."
The Meaning of "Regression to the Mean"
When some hear that phrase, they misinterpret it to mean that an individual will regress to the mean of all data...but that is certainly not the case. What it does mean, however, is that data will tend to regress back towards some expected value. Extreme data points—like the top-scoring RB in fantasy football in a given year—involve some amount of skill and some amount of luck. In the subsequent year it is not likely this year's top players will experience the same luck.
Let me give some examples. Suppose we have an X that predicts a Y really well, with small error. A plot of that data in Minitab Statistical Software would look like this:
In this case, with very little error around the predicted values, the highest X value predicts the highest Y value, and as it turns out corresponds to the actual highest Y value. The 2nd highest X value corresponds to the 2nd highest Y value. While this won't always match exactly, the ranks won't tend to be very far off. In fantasy football terms, think of X as the player's true ability, Y the expected fantasy points, and the actual value to include the expected fantasy points plus or minus some amount of luck.
Now consider a more realistic scenario, where there is more moderate error ("luck"):
Now consider the point at the highest X value, which has the highest expected Y value...the actual Y value is only the 5th highest. The 2nd-highest X value corresponds to the highest observed Y value. The 2nd highest observed Y value? The corresponds to the 16th highest X value. So back to fantasy football, if the observed Y values are fantasy points for a given season then the top 3 performers had the 2nd, 15th, and 16th highest true abilities. Ignoring the myriad of other factors that would predict the next season (aging, a change to a new team, a new coach, different players around them, etc.), we would only expect one of these three to be in the top 3 in the subsequent season.
Regression to the Mean in Fantasy Football—and Real Life
To reiterate, "regression to the mean" does not mean each point is expected to return to the average Y value of the entire dataset—just that we would expect it to fall back to the predicted value indicated by the line. That 16th-best player that obtained the 2nd-highest point total would need another incredibly lucky season to repeat.
Of course, regression to the mean is all around us and not limited to fantasy football, and examples abound in news stories and especially articles about finance. So the next time you read "Home Prices Pull Back From Record Highs" or "Crime Rates Up From Three-Year Low" try to consider whether anything has really changed or whether the data are just showing regression to the mean with no underlying cause.
And good luck in your fantasy draft! Given regression to the mean, you'll need it.