I recently read an article that talked about the randomness of this year's shortened NBA season. Because of the lockout, the season will be only 66 games long, instead of 82. The article says that having a sample size that's 16 games fewer than normal means there's a lot of uncertainty about how the season will play out. But just how much more uncertainty will there be?
We want to investigate how the sample size affects the margin of error around the proportion of games a team wins. For example, think of the proportion of heads you'll get when you flip a coin a certain number of times. We know that in general the proportion is 0.5, because a head has a 50% chance of coming up.
But in a given sample, this could vary greatly. If you flip the coin only 10 times, about 90% of the time the proportion of heads will range from 0.3 to 0.7. That small a sample has a wide margin of error. But as we increase the sample size, the margin of error in our proportion would drop greatly. If we flip the coin 50 times, about 90% of the time the proportion of heads will range from 0.4 to 0.6.
How much does decreasing the sample size of the NBA season increase the margin of error? First, let's select an NBA team to focus on. How about the team that everybody loves to hate, the Miami Heat? Last year the Heat had a winning percentage of .707. Let's "assume" that's the Heat's probability of winning a game. (Of course the true probability changes from game to game depending on the opponent, but for the sake of this post we're going to assume it's constant.)
So how big of a range can we expect from the two different sample sizes? We can use Minitab's Sample Size for Estimation command to compare how large the margin or error is for each sample size. Choosing this command brings up the following dialog box:
We select "Proportion" as the Parameter, because we are interested in the proportion of games the Heat will win. The Planning Value is 0.707 because we want to find the margin of error around a winning percentage of 0.707. And we enter sample sizes of 66 and 82 because those are the number of games in the regular and abbreviated season lengths that we want to compare.
Here are the results:
We'll use the margins of error to create a range of the winning percentages for the Heat.
- With a sample size of 66 games, the Heat's winning percentage would fall in a range of (0.582 and 0.813)
- With a sample size of 82 games, the Heat's winning percentage would fall in a range of (0.596 and 0.802)
Now the point of this isn't to predict the Heat's record next year. It's to compare the ranges given with each sample size. And we see they are almost the same! We really shouldn't expect any more variation in a 66-game season that we don't already see in an 82-game season. Sure, there will be good teams that nobody expected to be good, and bad teams that nobody expected to be bad—but it's not going to have anything to do with the smaller sample of games. It'll just be variation that we see every season in the NBA.
Now, what kind of a sample size would really make for a crazy season? Imagine if the NBA season were like the NFL, where you played only 16 games. Then what would the margin of error look like?
With a sample size of 16 games, the Heat's winning percentage would fall in a range of (0.433 and 0.902). That's a significantly larger range. Then you'd be able to blame some variation on the sample size. For example, if the 2010-2011 NBA season lasted only 16 games, the Cleveland Cavaliers would have made the playoffs! But after 82 games, they ended up dead last in the Eastern Conference.
But other than Cleveland, the standings after 16 games last year still aren't that different from the final standings. Of the 16 teams that made the playoffs after 82 games, 14 of them still would have made the playoffs after only 16 games! If things don't change much after a mere 16 games, we shouldn't expect a drastic different between a 66- and an 82-game season!
Photograph "basketball.gif" licensed under Creative Commons Attribution ShareAlike 2.0.