Predicting the U.S. Presidential Election: Evaluating Two Models (Part One)
You may have read about statistical models that claim to predict the outcome of the upcoming Presidential election. It’s easy to imagine that these models are complicated and contain many demographic, sociological, economic, and political factors. However, I was surprised to read in an article that two simple models supposedly generate accurate predictions.
Both of these models use stock market data. One model is based on the Dow Jones and the other on the S&P 500. Statistics are best when they are a hands-on experience, so while neither study included the data, I obtained both the stock market data and election data so we can try these models ourselves using Minitab statistical software!
We’ll evaluate both models. If the models are satisfactory, we’ll use them to make predictions for the upcoming Presidential election. Today, we’ll evaluate the Dow Jones model and tomorrow the S&P model. You can get the worksheet for the Dow Jones model here.
Model 1: Three-Year Change in the Dow Jones Industrial Average
The first model comes from a recent study titled “Social Mood, Stock Market Performance and U.S. Presidential Elections” by Prechter, et al.
The researchers find a positive, significant relationship between several outcomes for presidential elections that have an incumbent and the percentage change in the Dow Jones over a 3 year period. Each three-year period extends from November 1 after the previous election through October 31 of the year of the election.
Their theory states that the stock market is a proxy variable for social mood, not that the stock market directly affects voting. The stock market is a good measure of social mood because if society feels positive enough to invest more money in the stock market, they are presumably happy with the status quo, which could favor the incubment.
The Dow Jones Industrial Average (DJIA) data back to 1897 was easy to obtain from the Federal Reserve Economic Data (FRED) web site. For elections prior to 1896, I use the Foundation for the Study of Cycles data set that the study used. This data set uses market data from earlier indices to create a longer DJIA.
The study looks at the percentage change in the DJIA over different lengths of time (2-4 years) and how it correlates to different election outcomes. The researchers also include the traditional big three predictors of Presidential elections: economic growth, inflation, and unemployment. The study concludes that the three-year change in the DJIA is the best predictor. Further, when the DJIA predictor is included in the model, the other “Big Three” predictors become insignificant.
I’m going to test the three-year model to determine if it can predict whether the incumbent wins or loses. I also include other election outcomes in the worksheet if you want to try those out.
Assessing the Model with Binary Logististic Regression
Because the election outcome for the incumbent only has two possible values (Win or Lose), we need to use Binary Logistic Regression. And, we have one predictor, the percentage DJIA change over three years. We get the results below.
The p-value for the Dow Jones changes is significant at 0.025. Further, the odds ratio is 1.10, which indicates that every 1% increase in the Dow Jones is associated with the incumbent being 1.10 times more likely to win. The goodness-of-fit tests (not shown) all have very high p-values (greater than 0.6), which suggests that the model fits.
We should also look at the concordant/discordant pairs in the output:
This portion of the output indicates whether the predicted event probabilities of the binary logistic regression model match the observed outcomes. To do this, Minitab compares all pairs of observations that have different response values (Win or Lose) and their predicted event probabilities.
- If the predicted probability of success is higher for the observation corresponding to a "success," the pair is considered concordant.
- If the predicted probability of success is higher for the observation corresponding to a "failure," the pair is considered discordant.
For the Dow Jones model, 86.3% of the pairs are concordant, which is excellent! There are nearly 7 times as many pairs that are concordant than are discordant. In other words, the predicted event probabilities are accurate.
Predicting the Election with Minitab
The authors of the study specifically don’t use the model to predict the election's outcome because, for them, it's a study designed to determine the stronger influence in voting behavior. But why let that stop us from using their model to make a prediction? We have determined that the three-year percentage change in the DJIA is a significant predictor and that the model produces accurate event probabilities.
So, we just need to enter the the percentage change in the Dow Jones from November 1, 2009 to October 31, 2012 (33.8%) in the Prediction subdialog for binary logistic regression. Enter 33.8 and we get the follow prediction output.
The model predicts that President Obama has a 95% chance of being re-elected. The confidence interval (59, 99) is very wide, but it is entirely above 50 percent. Further, from the concordant pairs, we know that the probability is generally correct. This probability of re-election may seem high given the tight race in the polls. However, this prediction is based entirely on the Dow Jones, which has increased significantly while President Obama has been in office.
I'll close with a table that puts this prediction in the context of all Presidential elections since 1828 with an incumbent candidate. The table is sorted by the probability that the incumbent is re-elected. You can see how the high probabilities are associated with "Won" and the low probabilities are associated with "Lost." The middle probabilities are a bit mixed up, as you'd expect. The green row indicates the prediction for the upcoming election.
I'm pretty impressed by the fact that a single predictor works so well for something as complex as a Presidential election.
In the next post, we'll look at a different model that uses the S&P 500 over a different length of time. I was surprised by those results!