Spring Cleaning: Ipsos Polling in the Ohio and Florida Republican Primaries

I wanted to share a brief note on some polling which predates this blog. Ipsos, together with its media partner Thomson Reuters, conducted online polls in both the Florida and Ohio Republican primaries. In these states, we conducted multi-wave rolling samples that went up to the day before the election. These efforts (along with a single poll in South Carolina) are Ipsos’ first serious sojourn into online primary polling at the state level in the US. While it required many sleepless nights, ultimately, our final polls in both places performed quite well. We outperformed the market in Florida and performed equal to the market in Ohio, where our projected final results fell within the margin of error in both states.

Good results to say the least.  However, more importantly, we learned three important lessons:

  • First, Primary elections in the US, as a rule, are extremely volatile compared to general elections — with poll results fluctuating up until election-day. South Carolina is the perfect example of this volatility. Newt Gingrich won handily even though most polls, including ours, had Romney leading Gingrich until a few days out (Real Clear Politics). In large part, this flip resulted from low voter salience—people don’t seriously think about the election until a few days out.  Whatever the case, we ultimately decided that polling up to the day before the election was essential to ensure the most accurate results.  That way we could minimize the last minute shifts in voter sentiment.
  • Second, primary elections are low turnout elections.  Typically, primary turnout runs from as low as 6% to as high as 20%.  Such elections lead to the natural question: how do you accurately predict who will and will not vote?  In general elections, many pollsters use the traditional ‘Gallup’  5 to 7 item summated index which ranks respondents from high to low likelihood of voting (Gallup summated index) Likely voter questions might vary in phrasing, but they all include some declaration as to past and future voting behavior.  Many pollsters, however, criticize such stated likely voter models because people may mis- or underreport their voting intentions.  Instead, such critics advocate the partial or complete use of past voter profile data in estimating likelihood of voting (Slate.com ‘“Likely Voters” Lie’)

In the specific case of the Ohio and Florida primaries, we confronted an additional problem: too few voters. The traditional Gallup model performs fine for general elections but is comparatively weak in primaries. It does a poor job of discriminating between likely voters in low-turnout situations for two reasons.  First, the top 25% of declared likely voters tend to be clumped together in the top box of the scale.  In a general election, where voter turnout is 85%, this is not a problem.  However, in a low turnout election where only about 15% of electorate vote, the inability to discriminate within this clump is a serious handicap. We need a tool that allows us to accurately calculate the 10-20% of the electorate that will actually vote on primary day. The second problem is that a 25% turnout has a decidedly different partisan makeup than a 15% one.

Percent identifying as "very conservative" at different turnout rates

Going to polls on primary election day is not merely a function of a person’s willingness to vote. Instead, voting competes with more mundane yet essential tasks like putting food on the table and making sure the kids are picked up from school on time. Voters, especially in primaries, need to have a strong reason to vote. Put another way, they need an emotional link to the event, which in politics usually translates into partisanship.  As such, by not taking into consideration political variables, we miss more partisan voters in our likely voter model.

With these challenges in mind, we employed estimated probabilities of voting, using logistic regression as a function of past behavior, intended future behavior, and degree of partisanship.

Our model provides two advantages.  First, being estimated probabilities, we were able to discriminate voters in one-percent intervals from 0 to 100%.  This granularity was especially helpful at the extreme high end of likelihood to vote.  Second, by employing political variables, our model captured the partisan nature of primary elections.  This was especially important in the lowest turnout scenarios from 1% to 6% where partisanship is the key factor for turnout.  Given these ‘learnings’, Ipsos plans on using similar estimated likely voter models in the future for both primary and general elections.

  • The third lesson we learned regarded the fact that we used online, rather than dual frame telephone, samples in both Florida and Ohio. Online surveys often are panned in this field. However, unlike traditional opt-in online polls which use only recruited panelists, we employed something the market is calling blended online samples.  Such samples are online in nature but draw sample from multiple sources, including an array of traditional commercial opt-in panels as well as non-panel sources and other consolidator or aggregator sites.  In any one poll, the blended sample might include hundreds of sources, combining multiple commercial panels and non-panel sites.  This definitely is a brave new world which only now are we learning to navigate.  However, at first blush, our blended samples have performed admirably—having nailed two low turnout primary elections with the use of only simple post-stratification weighting.  More research is needed but the early signs are positive for blended online samples.

Well, this is all for the spring cleaning for now. I plan on going into munch more detail on estimated likely voter models and blended online samples in future posts. Until then.