The latest internet based poll from the Wall Street Journal and survey firm Zogby International released today shows Bill Ritter leading Bob Beauprez in the 2006 Governors race in Colorado by a narrow 45.5% to 42.8% margin (2.7 percentage points), with a stated sampling margin of error of plus or minus 4 percentage points. The poll also shows a 5.7% vote share of libertarian by Dawn Winkler-Kinatader.
This close race result is at odds with recent telephone based polling by other firms in the race which show Ritter with a commanding lead over Beauprez. Somebody’s wrong and it is probably this WSJ/Zogby poll.
A SurveyUSA poll also released today shows Ritter with a 55-38 lead, 17 percentage points, generally consistent with prior telephone polls in the race, but a stark contrast with the WSJ/Zogby poll. A September 19 poll by Rasmussen showed Ritter leading Beauprez by 16 points. A September 14 poll by the Rocky Mountain News and CBS4 showed Ritter leading Beauprez by 17 points. The previous Wall Street Journal/Zogby poll (June 22) showed Ritter leading Beauprez by 12.1 points.
What’s Going On?
While public preferences are constantly shifting over time in every political race, there has been no bombshell harming Ritter in the Colorado Governor’s race to indicate that there has been a 13 to 14 percentage point shift in public opinion polls in the last two weeks. Yet, if there hasn’t been such a dramatic swing in public opinion over the last two weeks, the poll results seem irreconcilable, since the difference between the phone polls and the Internet poll well exceeds the stated margin of sampling error in each poll.
Fund raising and endorsement data, which tend to parallel poll results, viewed in connection with the poll data, also tend to strongly confirm that the Ritter-Beauprez race is not nearly as close as this WSJ/Zogby International poll would suggest.
Two alternative theories could explain this anomolous poll result. First, this WSJ/Zogby poll is simply a fluke, or second, the polling methods in the WSJ/Zogby poll have produced a skewed sample which weighting was insufficient to correct.
Was It A Fluke?
The margin of error for opinion polls is stated as the 95th percentile of error that is produced by random sampling variation, i.e. by just happening to pick a biased sample through random chance.
The chance of any particular poll being a fluke outside the margin of error seems pretty low until you recognize just how often a race like the Colorado Governor’s race is polled. There are, at least, eighteen publicly known poll results in the Colorado Governor’s race so far. The odds of one of them being a fluke (i.e. giving results with an error greater than the margin of error) are therefore, pretty high.
In contrast, the likelihood that several prior polls, independent of each other, close in time, and using slightly different phone polling methodologies, were all fluke results, is very low. Viewed collectively, the three phone polls taken in the past two weeks are equivalent to a single phone poll showing Ritter leading Beauprez by 16.6 percentage points, with a margin of error of 2.5 percentage points. If they were all flukes and the WSJ/Zogby poll was correct, they would have to be off by five and a half times the margin of error of the collective result.
Indeed, the Zogby results are so far off the phone poll results that attributing the difference to fluky sampling in the Zogby poll is implausible. While 5% of poll results are outside the margin of error, this result is more than three times the margin of error off from the other results in the race. The odds of a result being this far off due to random chance alone are well below 1%. Thus, the fluke theory is less plausible than the alternative, which is a bad sampling methodology.
Was It A Skewed Sample?
The online polling method used by Zogby is unique, while all phone polls are done in basically the same way. This means that the methodology of phone polling has had much more validation against real election results than internet polling. Real life validation is the only really reliable way to screen a polling method for unrepresentative sample gathering methods.
You can hear in their own words what methods Zogby International used:
Zogby Interactive of Utica, N.Y. has assembled a database of individuals who have registered to take part in online polls through solicitations on the company’s Web site as well as other Web sites that span the political spectrum. Individuals who registered were asked to provide personal information such as home state, age and political party to Zogby.
Zogby International telephoned about 2% of respondents who completed the interactive survey to validate their personal data. To solicit participation, Zogby sent emails to individuals who had asked to join its online-polling database, inviting them to complete an interactive poll. Many individuals who have participated in Zogby’s telephone surveys also have submitted e-mail addresses so they may take part in online polls.
The Interactive polls were supplemented by 20 to 50 telephone calls in 20 states (AR, CA, CO, CT, FL, GA, IL, MD, MI, MO, NV, NM, NJ, NY, OH, PA, TN, TX, VA, WI) to ensure proper representation of all demographic groups.
Margins of error for each candidate vary by state and range between 2.9 to 4.3% percentage points. Margins for specific states are available on the state panels.
Zogby International President John Zogby says 15% of the company’s U.S. database of online-poll participants are “regulars,” who take part in half of the interactive polls the company conducts; the balance of the names of respondents in the database change frequently. Likely voters in each of the 26 states followed instructions sent by an e-mail that led them to the survey located on Zogby’s secure servers. Those polled were asked unique questions pertaining to the races in their state.
As is usual in polling, weightings are applied to ensure that the selection of participants accurately reflects characteristics of the voting population, including region, party, age, race, religion and gender.
The concern about Internet polling is that those who are participating are self-selecting, and that the sample of people who respond to Internet polls are not representative of the population as a whole.
People who actively seeking out to participate in a poll, such as the 15% of Zogby participants who are regulars, may be motivated to influence public perception of the race in a way that someone receiving a random telephone survey at dinner time may not be, and may also be responding strategically, for example, calling themselves independents when they are really Republicans or Democrats.
A more partisan sample would likely greatly impact the Ritter-Beauprez race, because Ritter’s lead is largely drawn from voters who identify as independents or moderates, a group of voters who also tend to be less actively involved in politics generally.
Also, the survey method systemically excludes people who don’t have internet access or aren’t sophisticated and regular internet users.
About 30% of the population has never used the Internet, and far fewer are regular enough users of the Internet and are comfortable enough with the technology, to be potential participants in Zogby’s poll. Only about 31% of the population regularly receives news over the Internet, a better proxy for persons within the Zogby online poll’s purview than the percentage who have ever used the Internet.
This is strongly skewed by age. About 95% of those age 18-24 have used the Internet, while less than than 30% of those over age 65 have ever done so. Likewise, 47% of those in the 30-34 age bracket regularly get news over the internet, while only 11% of those over age 65 do.
Those most likely to use the Internet, younger voters, are also the least likely age group to vote. Thus, any system of weighting by age requires a major adjustment from a general sample of Internet users.
Simply adjusting the sample based on age may not solve this skew, because there is every reason to believe that, for example, senior citizens who are internet users are not typical of that population as a whole. Internet using senior citizens tend to be more educated, more affluent, and to have different world views that reflect their unusual activity, than the senior citizen population as a whole.
Weighting based on demographics only works well when the sample in each demographic is typical of the population as a whole. If the sample of an undersampled demographic is atypical, weighting more heavily may actually make the adjusted result less accurate. Survey USA, which had one of the best track records in 2004, did little or no weighting, relying instead on true random sampling in its polls, which by brute force avoids tricky methodology issues associated with weighting.
Self-selection and internet user bias could be corrected somewhat by the phone calls made in the Zogby poll, but 20-50 phone calls cannot provide a very accurate sample of this population, particularly if they are not, as the methodology seems to indicate, targeted at people who do not have internet access. If, for example, 70% the population does not use the Internet sufficiently to be considered for the Internet poll, and the total sample size is 500 respondents, then just 14-35 responses will be drawn of the 70% of the population of people who are not heavy Internet users, while 465-486 of the responses would be drawn from the 30% of the population who regularly use the Internet. This subjects the poll to signficant and difficult to quantify risks of error.
Zogby has basically taken a possibly very good poll of 30% of the population without getting any insight into what the other 70% of the population is thinking. The more than 16 percentage point margin of error associated with the 35 person phone survey sample (that doesn’t overlap with the internet based portion of the survey) is so great in a survey like this one, where everyone knows that a candidate’s partisan base provides a minimum of 30% or so support for each candidate, that it provides no meaningful data about what the 70% of non-intense Internet users thinks at all.
The best example of a non-representive sample producing an incorrect result was the famous Dewey v. Truman poll, conducted on the then new technology of the telephone. The poll predicted a Dewey win, and the papers published that result. But, because telephone users at the time were not representative of the population at large, and non-phone users were much more strongly Democratic leaning, the poll failed to predict Truman’s win in that Presidential race.
The most recent WSJ/Zogby Interactive poll result is not to be trusted. It is very likely wrong, and is very likely wrong because its methodology is fundamentally flawed when applied to this dynamics of the Ritter-Beauprez race, at least.