It’s election year, and the Presidential campaign is picking up! These are exciting times for my buddy and I who are political junkies. Bring on the banners, slogans, rhetoric, and debates. Our TVs will be filled with ads about everything from financial policy to energy prices. Barack Obama and Mitt Romney may be in our living rooms more often than many family members!

We not only follow all of the races but we make small, friendly bets about the outcomes. However, the winnings pale in importance to the bragging rights! Each bet takes on a life of its own and winning the bet almost becomes more important than our own personal preferences about the races themselves.

We’ll bet about anything for both parties. For example, in 2008 we had bets on both the Democratic and Republican nomination outcomes on a state-by-state basis, the VP picks, and, of course, the Presidential election (also state-by-state). We fit in the mid-term elections and now we’re doing the whole Presidential thing again this year.

Overall, we’ve been pretty equal in wins. I do have the distinction of picking Sarah Palin as John McCain’s running mate before McCain picked her himself, and weeks before the public announcement! It turns out that when I made my pick, McCain’s staff was talking him out of picking Joe Lieberman. This was a good thing if only because my buddy had Lieberman on his list. But, I digress . . .

For the state-by-state contests, the one who predicts the most races correctly wins. In order to make our predictions, we look at lots of data. This involves a general reading of current events and looking at many polls.

You’d think that conducting a poll is straightforward. You ask a bunch of people who they’ll vote for and count up the results, right? But it’s not quite that simple. The challenges of accurate polling are the same as the challenges across the field of statistics. If polling is done correctly, the small, sampled group accurately reflects the full population. The trick is to do it correctly!

## Random Samples

Whether you’re conducting a Six Sigma project or a political poll, you can’t base your sample on convenience. That’s a sure way to bias your results because the easy-to-obtain samples are probably different from the harder-to-obtain samples. For political polls, the people closest to you tend to have more similar views. Consequently, pollsters specifically find ways to include those who are not easily accessible in order to represent everyone.

Random samples are the best way to represent a population. In order to obtain a random sample for a political poll, all members of the population must have an equal probability of being selected. Political polling organizations can’t poll people at malls, ballparks, or office buildings because Americans are not equally likely to be at these places.

However, most people have a residence and a phone. Therefore, political polls are generally conducted by calling people at their residence. Pollsters can’t use the phone book because one-third of residential phone numbers are unlisted. So, they have a complicated process to create their own complete list. The polling organization can then randomly call household phones from their list to obtain a pretty good, but not perfect, random sample.

Even after all of this work, this process *still *excludes certain portions of the population, including students on campus, personnel on military bases, prisoners, hospital patients, and the homeless. Even for a dedicated, professional organization, it’s hard to get a truly random sample. (Don’t confuse a random sample with a haphazardly collected sample!)

## Identifying the Target Population

Whether you are sampling parts or people, you need to identify the population. For pollsters, just calling people randomly isn’t good enough. They need to identify their target population. There’s more to this than meets the eye.

For political polls about specific races, you want to target the specific geographic region for that race. National numbers for a local race won’t help you! More specifically, you want to poll people who are not only old enough to vote but are also registered to vote. However, it goes even *further* than this!

If you want to predict an election, you only want to count registered voters who will actually vote! It really makes a difference. It’s not unusual that the turnout of eligible voters is only 40%. And, there are differences between those who vote and those who don’t. It’s not unusual to see a gap of 3-5 points when you compare eligible voters to likely voter groups in the same poll.

To pick out the likely voters, the pollster asks questions about the respondent’s voting history. Each polling organization has its own, highly-guarded method for filtering out the likely voters.

## Accurate Measurements

Accurate measurements are important for any study. Each area has its own measurement challenges. For example, quality improvement initiatives need to worry about the standards by which inspectors approve and reject parts. In polling, the phrasing and order of questions are hugely important. It’s all too easy to bias the results with poor questions. According to Gallup, describing something as a “welfare program” or a “program for the poor” can change the answer. Should the pollster ask about “sending” or “contributing” troops to a UN program? Which wording yields the responses that truly represents the entire population? Pollsters will often test different wordings to assess the impact. Some questions will be asked different ways in the same poll to create a more nuanced understanding. Other questions will always be asked the same way to allow for a comparison over time. Measuring opinions is both an art and a science!

## Margin of Error

Like all sample-based statistics, poll results produce a point estimate and a margin of error/confidence interval. The margin of error assumes that the sample is randomly drawn and it depends on the sample size. A larger sample size will shrink the margin of error.

You can calculate the margin of error using Minitab Statistical Software by going to: **Stat > Power and Sample Size > Sample Size for Estimation**. Suppose we wanted to calculate the margin of error for a politician with a 50% approval rating based on a sample size of 1,000. Fill out the dialog like this:

We get this output:

This tells us that the margin of error is approximately +/- 3.2%. The politician's true approval rating is likely to be between 46.8% and 53.2%.

## Lessons Learned

The process required to produce accurate political polls is very similar to that of any statistics-based study.

- Don’t underestimate the effort required to get a true random sample. Be sure that all elements of your population have an equal chance of being selected. Patrick wrote a nice post about this.
- Accurately target your population. The results you obtain are only reliably representative of the population that you sample. If you sample the incorrect population, your results may be worthless.
- Accurate measurements are crucial. You can have a perfect random sample from the correct population, but it’s all for naught if your measurements are inaccurate.
- Understand the margin of error or confidence interval. Because you’re sampling a subset of the population, there is bound to be uncertainty. It’s crucial to factor that uncertainty into your assessment.

Finally, there is no substitute for using your subject area knowledge while conducting an experiment and interpreting the results. For instance, in 2008 there was no poll suggesting that John McCain was going to pick Sarah Palin...but given the larger political landscape, I had a hunch!