Great thread.
Quote:
Originally Posted by d2_e4
Put more simply, surely you can't adjust for sampling error in a variable which is by far the most accurate predictor of the variable the poll is designed to gauge?
Yes you can, provided you know what the "correct" results for that variable are supposed to be.
In this election, the number 1 predictor of how people are going to vote is how they voted in 2016. The "correct" numbers for 2016 vote are known from the election results (and from a few other things like demographics data for new voters). If you read the full PDFs for the polls that actual give this data, more than 90% of the people who voted for the two leading parties are going to vote the same way again this time. Any survey that doesn't weight based on this is IMHO trash. Also this is a bit of an insurance against the "shy voters". Someone may be shy about saying they're Trump or Biden voters, but they are also pretty likely to be shy about admitting their 2016 vote too, so they just dilute the "don't knows" rather than removing votes from the camp they're going to vote in).
I've been posting a bit in the sports betting thread about polls. Here are two of my posts I'm vain enough to think are worth a cross-post:
Answering the idea that the 2016 polls favoured Dems:
Quote:
Originally Posted by LektorAJ
Maybe. The last 2016 poll which weighted by 2012 vote (as this thread knows, I think any poll that doesn't weight by previous vote is utter trash) was this one:
https://d25d2506sfb94s.cloudfront.ne...bReport_lv.pdf
(see table 12)
It has (actual results in parentheses)
Clinton 45 (48)
Trump 41 (46)
Johnson 5 (3)
Stein 2 (1)
Other 3 (1)
Not sure 4
So you could say the polls favoured third party candidates and also "favoured" not sure and underestimated both Clinton and Trump.
My view is that the polls were accurate but there was a late third-party squeeze and the voters just broke more towards Trump - which was predictable given two of the main 3rd party candidates were Johnson and McMullin. Effectively Clinton gained half of Stein's votes and half the not sure. Trump gained about half of the Libertarian+Other, and half of the not sure.
Admittedly this is hindsight but it's fairly obvious there where his votes are coming from.
....
... so in other words it correctly counted the left and right wings of society, but the more fragmented right (at time of polling) was able to get behind Trump on polling day to get the result they wanted.
On why polling "margin of error" calculated on a statistical basis is actually less of a factor than many think, with a warning about other types of polling error:
Quote:
Originally Posted by LektorAJ
Margin of error as calculated and displayed is nonsense. With proper weighting there is much less margin of error. ...
Consider the following world: population is split 50-50 between Dems and Reps. That was also the result last time in this world (a dead heat) although each party has lost 5% of its voters to the other party since then.
We conduct the following polls:
1) We sample 1024 voters randomly. mean dem score = 512(50%), variance = 1024*(0.5*(1-0.5)) = 256. SD = 16, margin of error (2 SD) = 32, or 3.125%.
2) We sample 512 voters who voted D last time, and 512 previous Rep voters. Again the mean is 512 but in this case the variance is 512*(0.95*0.05)+512*(0.05*0.95) = 48.64. SD = 7. Margin of error = 14 (1.4%)
The number of voters changing parties is slightly higher than the above but there are other weighting factors helping too.
If that sounds counterintuitive, consider the following:
3) We conduct a "gender identification" poll weighted on what binary gender people identified as 4 years ago (last time it was 50-50 but 0.5% of people have changed their mind in either direction since then). It's clearly nonsense that if you sample 512 each of 2016 men and 2016 women, you would have a 3% margin of error in terms of what they identify as as now. The actual margin of error over that sample is about 0.4%.
Obviously plenty of polls don't weight by the most relevant factors because they exist to generate exciting narratives for the people who commission them to write stories about, but you have to ignore those ones (and also ignore the ones that are directly biased for political reasons or because the company is given better odds for being the outlier which got it right versus part of the herd).
The actual doubt in polling isn't mainly to do with the statistical margin of error - and therefore it's not something that can be cured by simply taking bigger statistical samples or averaging more polls together. The actual known unknowns are to do with:
1) late swings
2) postal votes invalidated
3) other shenanigans
4) reluctance to state true voting intention to pollsters.
plus any unknown unknowns that may come along.
That stuff isn't limited to the margin of error at all.
(btw the above isn't exactly how you calculate statistical margin of error when you don't know the underlying distribution but it shows the relevant principle)
Last edited by LektorAJ; 10-28-2020 at 07:02 AM.