It’s been a grueling election campaign for Senator William Overstate, or Will Overstate, as he is known by his constituents. Senator Overstate has been running his re-election campaign on a single issue: to make Election Day a national holiday.
That sounds good to me!
The senator argues that it's difficult for many people to get to the polls on election day because of work—especially those who work longer hours at lower wage jobs. Therefore, making election day a national holiday would increase voter turnout. Some states have already made election day a state holiday for that reason.
The senator also wonders whether other factors, such as gender and age, affect turnout as well. After collecting publicly available data for these factors, he uses Minitab’s regression analysis to evaluate their associations with voter turnout.
Unfortunately, in his exuberance to serve the public good, his own self-interests, or perhaps both, Senator Overstate tends to get carried away in his claims. Even when they have a valid basis.
Let's see how his conclusions from the regression analysis hold up to the fact checkers.
Analyzing the full model
The first step is to analyze the full model, which includes all the predictor variables in the study. To determine the final model, the predictors will be removed one and a time, based on an alpha level of 0.1. Here are the initial results for the full model:
The p-value for Median Age is 0.993, which is greater than 0.1. Obviously, this shows that age does not significantly affect voter turnout rate. You should remove the term from the model and vote for me.
Senator Overstate is correct in saying that Median Age is not a statistically significant predictor in this model. However, to then say that this shows that age does not significantly affect the voter turnout rate is incorrect. In fact, other data clearly show that the turnout rate does vary considerably for different age groups, with the turnout being highest for those over 60 and lowest for those under 30.
So why isn’t the term significant in this model? It may be that the median age is not a good metric to capture an age effect on voter turnout. For example, imagine a state where 50% of people are under 10 years old: the median age is likely to be very low. Since the turnout rate is calculated based on the percentage of eligible voters, the state could have a very high turnout rate and a low median age.
Another possibility is that median age is correlated with one of the other predictors in the model and therefore its effect is being “drained off” by the other predictor. One should avoid including correlated predictors in a regression model—it’s called multicollinearity—but Senator Overstate doesn’t like using big words because they turn voters off.
In this model, what other predictors might be correlated with Median Age? We suspected Average weekly earnings. Using Minitab Statistical Software, we tested the correlation. Sure enough:
Out of curiosity, we also ran a regression model in Minitab with only Median Age as a predictor of voter turnout and got these results:
So Median Age does have a statistically significant effect when it’s the only predictor variable for the turnout response.
Still, Senator Overstate is correct in saying that we can remove Median Age from the full model because it is not statistically significant and has the highest p-value of all of the predictors.
Analyzing the reduced model
The model is reananalyzed after the Median Age term is eliminated.
The p-value for Sex Ratio is 0.7439, which is greater than 0.1. Obviously, the sex of the voter does not significantly affect the turnout rate, so we’ll remove it from the model and happily accept your campaign contributions.
Some of the same issues for Median Age apply here as well. Other data do show that women tend to vote more often than men, but whether this difference is statistically significant is unclear. However, we could not find any correlations between the Sex Ratio and other predictors in this model. Senator Overstate is correct that we can remove the term from the model based on its p-value.
Analyzing the final model
After removing Sex Ratio term, the model with the three remaining predictors is evaluated.
As you can see, average hours worked and average weekly earnings have a statistically significant effect on voter turnout. Increased work hours and lower weekly earnings keep people from going to the polls. State holiday status also has a statistically significant effect. Trust my data, folks. Re-elect me and spend your next election day in a Lazy-Boy recliner!
Senator Overstate is correct in that all three terms in the model are statistically significant at the alpha level of 0.1. The average numbers of hours worked has a negative coefficient (-0.0233). Therefore, longer hours are associated with lower turnout. The average weekly earnings has a positive coefficient (0.00013), therefore higher earnings are associated with higher turnout. Interesting as these associations are, however, they do not prove causation.
Senator Overstate notes that state holiday status has a statistically significant effect on turnout. What he neglects to mention is that the term has a positive coefficient (0.01725) associated with its "No" value.That means that states that do NOT have a state holiday on election day tend to have higher turnout! In fact, his campaign opponent, I. M. Fullovit, has used this result to argue that making election day a holiday will lower voter turnout. But Mr. Fullovit appears to be living up to his name, reaching false conclusions by mistakenly associating correlation with causation.
We think it very likely that the states that already had very low voter turnout enacted state holidays to increase their turnout. To determine whether this resulted in increased turnout we would need to compare turnout rates before and after their laws were passed.
Finally, although Senator Overstate's results have real potential, further work needs to be done to examine other possible factors that might influence turnout, some of which could be confounded with these predictors, such as educational level, or proximity of polling stations. The adjusted R-squared value suggests his final model accounts for only 20% of the variation in voter turnout.
Whew! Are you exhausted as I am trying to sort through all the claims and fact checks?
It's a dirty job, but it's got to be done. Because the use of language in statistics, just like the use of language in politics, can easily fall prey to puffery and overstatement.
Note: The above regression analysis uses the actual turnout data from the 2008 U.S. presidential election, from United State Elections Project Web Site. The data for predictor variables is from the U.S. Census Bureau (Median age by state, sex ratio by state) and the Bureau of Labor statistics (State hours and earnings, annual averages).
You can find the list of states that observe a holiday on election day here.