Predicting the U.S. Presidential Election: Evaluating Two Models (Part One)

Mitt RomneyBarack ObamaYou may have read about statistical models that claim to predict the outcome of the upcoming Presidential election. It’s easy to imagine that these models are complicated and contain many demographic, sociological, economic, and political factors. However, I was surprised to read in an article that two simple models supposedly generate accurate predictions.

Both of these models use stock market data. One model is based on the Dow Jones and the other on the S&P 500. Statistics are best when they are a hands-on experience, so while neither study included the data, I obtained both the stock market data and election data so we can try these models ourselves using Minitab statistical software!

We’ll evaluate both models. If the models are satisfactory, we’ll use them to make predictions for the upcoming Presidential election. Today, we’ll evaluate the Dow Jones model and tomorrow the S&P model. You can get the worksheet for the Dow Jones model here.

Model 1: Three-Year Change in the Dow Jones Industrial Average

The first model comes from a recent study titled “Social Mood, Stock Market Performance and U.S. Presidential Elections” by Prechter, et al.

The researchers find a positive, significant relationship between several outcomes for presidential elections that have an incumbent and the percentage change in the Dow Jones over a 3 year period. Each three-year period extends from November 1 after the previous election through October 31 of the year of the election.

Their theory states that the stock market is a proxy variable for social mood, not that the stock market directly affects voting. The stock market is a good measure of social mood because if society feels positive enough to invest more money in the stock market, they are presumably happy with the status quo, which could favor the incubment.

The Dow Jones Industrial Average (DJIA) data back to 1897 was easy to obtain from the Federal Reserve Economic Data (FRED) web site. For elections prior to 1896, I use the Foundation for the Study of Cycles data set that the study used. This data set uses market data from earlier indices to create a longer DJIA.

The study looks at the percentage change in the DJIA over different lengths of time (2-4 years) and how it correlates to different election outcomes. The researchers also include the traditional big three predictors of Presidential elections: economic growth, inflation, and unemployment. The study concludes that the three-year change in the DJIA is the best predictor. Further, when the DJIA predictor is included in the model, the other “Big Three” predictors become insignificant.

I’m going to test the three-year model to determine if it can predict whether the incumbent wins or loses. I also include other election outcomes in the worksheet if you want to try those out.

Assessing the Model with Binary Logististic Regression

Because the election outcome for the incumbent only has two possible values (Win or Lose), we need to use Binary Logistic Regression. And, we have one predictor, the percentage DJIA change over three years. We get the results below.

Binary logistic output for the Dow Jones model

The p-value for the Dow Jones changes is significant at 0.025. Further, the odds ratio is 1.10, which indicates that every 1% increase in the Dow Jones is associated with the incumbent being 1.10 times more likely to win. The goodness-of-fit tests (not shown) all have very high p-values (greater than 0.6), which suggests that the model fits.

We should also look at the concordant/discordant pairs in the output:

Binary logistic output for the Dow Jones model

This portion of the output indicates whether the predicted event probabilities of the binary logistic regression model match the observed outcomes. To do this, Minitab compares all pairs of observations that have different response values (Win or Lose) and their predicted event probabilities.

  • If the predicted probability of success is higher for the observation corresponding to a "success," the pair is considered concordant.
  • If the predicted probability of success is higher for the observation corresponding to a "failure," the pair is considered discordant.

For the Dow Jones model, 86.3% of the pairs are concordant, which is excellent! There are nearly 7 times as many pairs that are concordant than are discordant. In other words, the predicted event probabilities are accurate.

Predicting the Election with Minitab

The authors of the study specifically don’t use the model to predict the election's outcome because, for them, it's a study designed to determine the stronger influence in voting behavior. But why let that stop us from using their model to make a prediction? We have determined that the three-year percentage change in the DJIA is a significant predictor and that the model produces accurate event probabilities.

So, we just need to enter the the percentage change in the Dow Jones from November 1, 2009  to October 31, 2012 (33.8%) in the Prediction subdialog for binary logistic regression. Enter 33.8 and we get the follow prediction output.

Prediction based on the binary logistic regression of the Dow Jones model

The model predicts that President Obama has a 95% chance of being re-elected. The confidence interval (59, 99) is very wide, but it is entirely above 50 percent. Further, from the concordant pairs, we know that the probability is generally correct. This probability of re-election may seem high given the tight race in the polls. However, this prediction is based entirely on the Dow Jones, which has increased significantly while President Obama has been in office.

I'll close with a table that puts this prediction in the context of all Presidential elections since 1828 with an incumbent candidate. The table is sorted by the probability that the incumbent is re-elected. You can see how the high probabilities are associated with "Won" and the low probabilities are associated with "Lost." The middle probabilities are a bit mixed up, as you'd expect. The green row indicates the prediction for the upcoming election.

I'm pretty impressed by the fact that a single predictor works so well for something as complex as a Presidential election.

Probabilities that the incumbent is re-elected

In the next post, we'll look at a different model that uses the S&P 500 over a different length of time.  I was surprised by those results!


7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >


Name: Josh S • Monday, November 5, 2012

I'm not sure this metric is any more valid than the ones listed here: http://www.xkcd.com/1122/
These may all be 'statistically significant' without having a real grounding in reality.

Name: Jim Frost • Monday, November 5, 2012

Hi Josh, thanks for your comment. There are a couple of key differences between the historical precendents on that web site and statistical analysis.

For one thing, the electoral precedents are not metrics (a metric is a standard of measurement for a quantitative analysis). Instead this type of argument is an informal statement along the lines of "no President has been re-elected if . . .". And, these are followed by some observation that supposedly determines the outcome of the election. In other words, if you just know this one fact, you'll know the outcome of the election.

The stock market data are different, particularly when you apply statistical analysis. We've measured changes in the stock market for nearly 200 years. We then statistically relate these changes to nearly two centuries of Presidential elections and their outcomes.

Further, the statistical model does not suggest that changes in the stock market entirely predict the outcome. In fact, good statistics quantify the uncertainty. In this case, we have both the confidence interval for the prediction and the concordant/discordant pair analysis which assesses how well the probabilies have matched historical reality.

The last table shows how the model produces probabilities, derived from changes in the Dow Jones, that generally match the actual outcomes. However, these are probabilities, not certainties. So, you'll also notice that there are candidates who have won when the probability was low and candidates who have lost when the probability was high.

However, the general trend for the probabilites is that higher probabilities correspond to a win and low probabilities correspond to a loss.

These assessments are how we ground the analysis in reality. It's possible that some day the Dow Jones model will no longer predict election outcomes. However, we have yet to see that in the historical record.


Name: Josh • Monday, November 5, 2012

Hi Jim,
Thanks for your response. I certainly agree that the webcomic is not the same as the statistical analysis. It is kind of funny though in light of a lot of the predictions that a lot of people seem to make, and the importance many people seem to place on indicators that may not have much to do with the outcome.

At any rate, it seems that we can get similar results with other indicators as well, and they may be just as questionable as picking a semi-arbitrary timeframe and economic indicator.
For example, I looked up the average temperature for the month of October, and duplicated your analysis, with similar results. (data from here: http://www.ncdc.noaa.gov/oa/climate/research/cag3/na.html and it only goes back to 1895, so I only have 19 data points)

Year Winner %DJ Change IncumbentResult OctTemperature
1900 William McKinley 20.2199 Won 56.52
1904 Theodore Roosevelt -2.5360 Won 54.05
1912 Woodrow Wilson -8.7792 Lost 53.26
1916 Woodrow Wilson 34.5293 Won 52.20
1924 Calvin Coolidge 41.6939 Won 55.04
1932 Franklin Roosevelt -75.9780 Lost 52.42
1936 Franklin Roosevelt 97.6679 Won 53.51
1940 Franklin Roosevelt -0.9784 Won 56.16
1944 Franklin Roosevelt 23.2691 Won 55.35
1948 Harry Truman -0.2965 Won 53.24
1956 Dwight Eisenhower 73.4063 Won 56.24
1964 Lyndon Johnson 24.0452 Won 53.79
1972 Richard Nixon 11.8169 Won 52.36
1976 James Carter 1.6968 Lost 50.08
1980 Ronald Reagan 14.5716 Lost 52.67
1984 Ronald Reagan 39.2884 Won 53.74
1992 William Clinton 21.9351 Lost 54.49
1996 William Clinton 63.2823 Won 53.77
2004 George W. Bush 8.2424 Won 55.55

I don’t have data for Oct2012, but assuming that it’s the same 0.13F warmer that Sept2012 was compared to Sept2011, then I get similar results (though not quite as good of a p-value. Is that due to the fewer data points, or a slightly worse significance?)

Response Information

Variable Value Count
IncumbentResult Won 14 (Event)
Lost 5
Total 19

Logistic Regression Table

Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -50.6595 29.3071 -1.73 0.084
OctTemperature 0.966191 0.550237 1.76 0.079 2.63 0.89 7.73

Log-Likelihood = -8.282
Test that all slopes are zero: G = 5.337, DF = 1, P-Value = 0.021

Measures of Association:
(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary Measures
Concordant 55 78.6 Somers' D 0.57
Discordant 15 21.4 Goodman-Kruskal Gamma 0.57
Ties 0 0.0 Kendall's Tau-a 0.23
Total 70 100.0

Predicted Event Probabilities for New Observations

New Obs Prob SE Prob 95% CI
1 0.941616 0.0707179 (0.564475, 0.995042)

Values of Predictors for New Observations

New Obs OctTemperature
1 55.31

Perhaps, this means that it's another indicator that our current incumbent will be reelected, but it doesn't seem to match with the process knowledge that these would be related.
I didn’t spend much time trying to find more data, my guess for the October 2012 temperature is a stab in the dark, and I don’t really understand binary logistic regression, so obviously my comments and results are questionable at best. Even that makes me wonder what some TV analysts are doing and if any of this has any practical significance. I guess we’ll find out tomorrow!

Name: Jim Frost • Tuesday, November 6, 2012

Hi Josh,

The web comic was certainly funny! And, I think that type of analysis is easier for the media to present. It's simpler to present and the conclusions appear to be more definite. Unfortunately, things in the real world are often not simple or definite!

I ran your binary logistic model with the average temperature for October just to see what it produces. The temperature predictor is not signficant with a p-value of 0.079. While not significant, I'm impressed that it's anywhere near significance!

To get r-squared values to compare, I ran 2 more models with General Regression, both using the smaller dataset from 1900 to present for consistency. For the continuous response, I used the difference in the popular vote between the incumbent and closest challenger.

When I ran this with the change in Dow Jones, the DJ is significant (p=0.004) and the r-squared is 39.91%

When I use the October temperature as the predictor, the temperature is not significant (p=0.208) and the r-squared is only 9.15% The predicted r-squared is actually a -7.34%!

So, while I'm surprised that the binary logistic model for the October temperature has as low of a p-value as it does, it is still insignificant. And, the comparison of the two General Regression models also seems to reconfirm that the Dow Jones model is better (significant predictor and higher r-squared) than the October temperature model.

Whew! I'm not sure what I would've thought if the average October temperature predicted elections as well. I suppose temperature is related to turnout as Patrick Runkel writes about in his current Minitab blog post.

But, you raise a great point. You have to use your process knowledge to see what makes sense to include as variables. I only spent a couple of sentences explaining why the change in the Dow Jones might be a legitimate predictor for election outcomes. If you want a more complete rational, you should read the original study. There is a link to it in the blog post and authors write about it for 42 pages. Whether you agree or disagree with them, at least you'll get the full case!

Of course, in the end, the only statistic that matters for the election is how people vote. Well, technically it's how the Electoral College votes, but that's another issue!


Name: Daniya • Saturday, May 31, 2014

Dear Jim,
thank you very much, for your article, it's very helpful, I am doing research in the same topic, mostly following Robert Prechter's paper. The difficulty is, I couldn't find the data for stock market before 1896. Could you please send it to my e-mail? Thank you very much in advance.
Kind regards,

blog comments powered by Disqus