This is from a while ago (Nov 18, 2008), but it’s still interesting:
“Zogby Engages in Apparent Push Polling for Right-Wing Website”
I don’t know if many of you know, but there is an election in November. For president. Of the United States.
As a result, you’re probably being inundated with polls. Obama 50 McCain 46 plus or minus 3 points. McCain 48 Obama 46 plus or minus 4 points. You know stuff like that.
So how do they get the plus or minus number?
Let’s take the Zogby poll for 10-17-2008 through 10-19-2008. They “randomly” surveyed 1211 people , and they reported Obama 50% McCain 46% plus or minus 3 points.
So lets think about what is going on. What we are trying to do is estimate the proportion of the population of likely voters that will vote for Obama or McCain. The only way to truly find the proportion who will vote for Obama or McCain is to ask everyone. For a million reasons a polling company can’t just go ask everyone in the country who they are going to vote for. (The only group with enough resources to do that is the government, and even they have trouble.) So we sample (randomly!) from this population to attmept to estimate the true population parameter of interest.
So a polling company goes out and asks N likely voters who they are going to vote for. We are trying to estimate the probabilty of voting for Obama (or McCain). N fixed/ independent trials? That seems like a binomial random variable to me.
A binomial random variable is charachterized by two parameters, N and P. N is our sample size, and we wish to estimate P. We estimate P simply by calclating X/N where X is the number of people voting for Obama (or McCain). We call the estimate P_hat to distinguish from P, the true parameter. The variance of P_hat is estimated by P_hat*(1-P_hat)/N.
Now it just so happens that as the sample size gets large, the distribution of the estimator of P tends towards a normal distribution. Thus we can use a normal approximation to build a confidence interval for the parameter.
With this normal approximation, the 95% confidence interval to the true value of P is P_hat plus or minus 1.96*(Standard Deviation of P_hat).
So back to the Zogby poll. They asked 1211 people who they were going to vote for. 606 responded Obama. So the best guess we can make as the true value of the parameter is 606/1211=.5004. That’s the 50% estimate for Obama. Now we compute 1.96*sqrt(.5004*(1-.5004)/1211)=0.02816138. So our estimate is accurate to wihin 2.8%. Round up to 3 and that’s how Zogby gets its plus or minus number.