Election Modeling - Politics and Economics

Two Plus Two Forums Other Topics Politics

Election Modeling

...

Page 1 of 4

1 2 3 4

Page 1 of 4

1 2 3 4

05-29-2012 , 04:32 AM

goofball

Beat Nate Silver

Join Date: Oct 2003 Posts: 20,424

Introduction
I got tired of waiting for Nate Silver to release his own election model so I started building my own. I don’t do this for a living and have new baby, full time job, part time other job and hence limited time. However using the 80/20 principle I’ve found a good intuitive way to use aggregate polls and estimate each candidates’ chance of winning one of a dozen swing states.

My model currently informed only by polling data (taken from RCP) and how far away we are from the election. Attempting to include economic or demographic data would result in a lot more work without much more added value since that information is already captured in polls.

I use a logistic model to translate aggregated polling data into likelihood of winning for each state. The shape of the logistic graph is informed both by how far we are from election day and how much polling has gone on in a state (the curve gets sharper with more polling data and the closer we get to election day).

Translating a candidate’s probability of winning each state is more difficult than it might appear on its face as individual state outcomes are not independent of each other. I’ve run Monte Carlo simulations using each state’s probabilistic outcome, and those outcomes correlated at varying degrees. I am still exploring the best way to estimate interstate correlations and I’m also investigating other possible solutions.

Summary
1) Aggregate state level polling data weighting newer, bigger polls over older, smaller ones.
2) Translate aggregated polls into a likelihood of winning for each candidate.
3) Simulate elections outcomes using each candidate’s chances in each state.
4) I think a discussion of election modeling will do well it its own thread. The model is based entirely on math, not politics, I didn't even know what the outcome would be as I was building it. If you wish to yell about birth certificates, dogs on roofs, the stupid story de jure, etc. etc. (as I frequently do) the general election thread is here

Last edited by goofball; 05-29-2012 at 04:53 AM.

05-29-2012 , 04:44 AM

goofball

Beat Nate Silver

Join Date: Oct 2003 Posts: 20,424

State Poll Aggregation:
RCP posts poll data as it comes in so unless and until I find a better source I’ll use them. I calculate the poll average weighting each poll by age and sample size. Older smaller polls are less meaningful. At the state level sample size weights work linearly, a 1000 person poll is weighted twice as much a 500 person poll. Age weights work on a half life of 30 days (this may need to chance as the election draws near). A poll from today counts twice as much as a poll from 30 days ago which counts twice as much as a poll from 60 days ago. I may build in a distinction between likely and registered voter polls but I haven’t yet.

Taking Florida as an example, see the spreadsheet below:

The “Other” column is just Other = (100 – Obama – Romney).

The last piece to address is the effective sample. It is the sum of the sample sizes times the poll weights and indicates the effective global sample of the polling done in the state. The more recent big polls have been done the higher the effective sample will be. The effective sample is also used in the translation of aggregate poll to winning %, which will be discussing next.

See the aggregate results for a dozen swing states below.

05-29-2012 , 04:52 AM

goofball

Beat Nate Silver

Join Date: Oct 2003 Posts: 20,424

State Winning Percentages
After estimating the most accurate possible poll, the next step is to ask, what does that poll tell us about each candidates chance to win the state. I use a logistic model for each state informed by how far we are away from the election and the state’s effective sample size. For example for a fully sample poll (defined at n = 10,000) done today the relationship between polling advantage and chance of winning looks like:

However on Election Day it looks like:

This is steeper (and implies errors smaller) than a typical poll one might read on election day, however it’s important to remember a model using many aggregated polls should have a much lower standard error than any one poll.

The logistic function is calibrated as follows. Polls currently conducted have two main sources of error:
1) The poll sample doesn’t necessarily represent the population (sample size)
2) The poll doesn’t know what will happen between now and election day (cone of uncertainty)

Error source #1 can be calculated. We discount polls against what we have defined as a full sample (10k) using the square root of the ratio of the sample / 10,000 (quite a bit went into selecting square root ratio over log-ratio or standard ratio if anyone is interested).

Quantifying the error from Source #2 is more difficult. The currently model assumes a linear flow of information (we learn, on average, as much going from 60 to 59 days out as we do from 9 to 8 days). We then calibrate the logistic using the assumption that a candidate who is 1 point down in a state in a full 10k sample today has a 45% chance to win that state. Plugging everything in we get the following results:

In summary:
1) Aggregate poll is translated into a likelihood of winning using a logistic curve that gets sharper as sample size increase or the election grows closer.
2) Error comes from two primary sources (1) sample size and (2) cone of uncertainty.
3) The logistic is calibrated assuming the error from (2) is zero on Election Day, and assuming a candidate ahead by 1 point in a full sample today has a 55% chance to win the state.

05-29-2012 , 04:57 AM

goofball

Beat Nate Silver

Join Date: Oct 2003 Posts: 20,424

Electoral College Simulations
I plugged the calculated winning percentages into a Monte Carlo simulator. If we were to assume no correlation between state outcomes, President Obama would have a 91% chance to win. This is not correct though state outcomes are clearly dependent, so I built state to state correlation into the simulation. Assuming middling state to state correlation drops President Obama’s chance to win to 73%, while using a very high correlation reduces it to 66%. I’m currently using a global factor (every state is equally correlated with every other state, which needs refining but figuring out specific state-level correlations is difficult and involved). I’m also exploring other methods but have yet to make much progress.

Summary:
1) President Obama’s chance to win re-election based on current polling data is probably between 65% and 75%.
2) Florida and North Carolina are currently by far the closest states. Even if Mr. Romney wins both those states (and AZ, and MO) he’ll still need to pick off at least 3 states from President Obama (the most likely being CO, OH, and VA).
3) Lots of polling is being done in FL and OH.
4) Figuring out the correlation in outcomes between states is difficult.

05-29-2012 , 04:57 AM

goofball

Beat Nate Silver

Join Date: Oct 2003 Posts: 20,424

Next Steps
I’ve got some things on my to do list
1) Add a distinction between likely and registered voter models.
2) Add some feature of momentum (especially as we approach election day) for example I’m not sure I believe the president is still ahead in NC.
3) Refine interstate outcome correlations
4) Build in other things to poll weights (discount outiers? Add house effects?)
5) Others?

05-29-2012 , 06:13 AM

mutigers

Carpal \'Tunnel

Join Date: Jun 2006 Posts: 37,175

Looks interesting I appreciate the effort. Don't know much about models but taking a class on them and polling this summer so maybe I can offer some insight in a couple of months

05-29-2012 , 10:28 AM

Go Get It

Carpal \'Tunnel

Join Date: Feb 2010 Posts: 12,506

Looks kinda cool. GJ imo.

05-29-2012 , 10:39 AM

13ball

Carpal \'Tunnel

Join Date: Mar 2006 Posts: 12,912

I'd be interested to see the LV/RV breakdown wrt polling.

I also think that even with an aggregate of polls, your election day range of error is way too tight. I don't see how a three point poll average advantage could 100% guarantee victory. Hell, eyeballing, a one point advantage on election day implies 85% chance of victory. Have you looked at past elections to see if this holds?

05-29-2012 , 10:39 AM

swinginglory

banned

Join Date: Nov 2010 Posts: 2,508

A couple of things. First of all kudos for the effort, but I think some of your methodology isn't going to give you a good result of the situation on the ground today.

First of all when Romney was polled vs. Obama in April and March he wasn't even the nominee and was getting whacked on a daily basis by Newt and Santorum and many of those candidate's supporters probably weren't saying "Romney" when the pollsters called in as high a number as today when he is their only choice.

Secondly, I haven't taken statistics in 40 years, but I doubt as you double sample size from 1,000 to 2,000 you double the accuracy of the data, but I could be wrong about that.

For example, your methodology gave you practically double the leads in WI and MI for Obama than the RCP averages which use more recent polling data.

Likely voters , especially from multiple sources, would be better than all voters or registered voters, but there isn't enough of that data on the stete by state level yet.

I guess my overall criticism is data from 30 to 90 days ago is pretty meaningless especially when Romney wasn't even the nominee at the time. It is ok for showing trends, but useless when it comes to gauging current sentiment.

05-29-2012 , 11:59 AM

#10

zikzak

Carpal \'Tunnel

Join Date: Jul 2009 Posts: 22,198

Subscribed.

05-29-2012 , 12:04 PM

#11

gusmahler

Carpal \'Tunnel

Join Date: Jul 2005 Posts: 29,443

Quote:

Originally Posted by goofball

Can you explain this a bit? I don't see why, e.g., the voters in AZ care about what voters in NM or CO are doing. IOW, the likelihood of Romney winning AZ is unaffected by whether or not Romney wins CO and NM.

05-29-2012 , 12:06 PM

#12

seattlelou

Carpal \'Tunnel

Join Date: Dec 2009 Posts: 31,584

Very cool!

05-29-2012 , 12:13 PM

#13

Chips Ahoy

help me help you

Join Date: Apr 2007 Posts: 28,869

Quote:

Originally Posted by gusmahler

The voters in AZ care about the same things as the voters in NM and CO. The sum of the campaigns, the news cycle, etc., is unpredictable, but it has similar effects in different states.

It's the degree to which whatever swings the outcome in one state matters in another state. Imagine that Romney loses Texas. Whatever made that possible means he's going to lose a whole bunch of other states. Or suppose Obama loses California. He's not going to lose CA and win Florida. So the outcomes of states have some correlation.

05-29-2012 , 12:19 PM

#14

jackaaron2012

journeyman

Join Date: May 2012 Posts: 268

I don't know if it helps even at all, but when discussing Ohio, I think it should be known that depending on where you poll, you can get completely different results.

If you were to poll greater Columbus, Cleveland, and Cincinnati (Three of the most populated cities), it would lean greatly to Obama (and interestingly R Paul).

If you polled all other, it would lean toward Romney.

And, if you did some type of combination, again, it would really matter how much the big three cities were weighted.

If I were to actually deem Ohio as an Obama state, it wouldn't be because of the current polling numbers, it would be because the population outside those three cities may not stand behind Romney (and thus, just not vote) as much as they would get out for a more polarizing figure.

05-29-2012 , 12:21 PM

#15

gusmahler

Carpal \'Tunnel

Join Date: Jul 2005 Posts: 29,443

Quote:

Originally Posted by Chips Ahoy

It's the degree to which whatever swings the outcome in one state matters in another state. Imagine that Romney loses Texas. Whatever made that possible means he's going to lose a whole bunch of other states. Or suppose Obama loses California. He's not going to lose CA and win Florida. So the outcomes of states have some correlation.

But if Obama were to F up so badly that he loses CA, wouldn't that already show in the FL polls?

05-29-2012 , 12:46 PM

#16

bobman0330

Carpal \'Tunnel

Join Date: Aug 2004 Posts: 24,234

Quote:

Originally Posted by gusmahler

But if Obama were to F up so badly that he loses CA, wouldn't that already show in the FL polls?

A source of uncertainty in the model is that events that happen between the polls and the election could cause the electorate to favor one candidate over the other. Many events that could cause CA voters to deviate in a pro-Romney way from CA poll results would also cause FL voters to do the same.

05-29-2012 , 12:52 PM

#17

Chips Ahoy

help me help you

Join Date: Apr 2007 Posts: 28,869

Quote:

Originally Posted by gusmahler

But if Obama were to F up so badly that he loses CA, wouldn't that already show in the FL polls?

goofball is building a simulator that takes % chances of winning each state and turns it into a national result by "running it twice a whole bunch". Sometimes the simulation will have Obama losing CA (today, it won't as the election gets closer). He's saying it's a better simulator if in the simulated universe where Obama loses CA that means he loses elsewhere too.

05-29-2012 , 01:50 PM

#18

pokerbobo

Carpal \'Tunnel

Join Date: Mar 2007 Posts: 14,432

Just a suggestion for your model...

IFAIK, undecided voters, in most POTUS elections break for the challenger by large margins.... obama finds himself south of 50% still in most of swing states and though he may have a lead in a bo-mr formula, using a formula based off the <>50% mark for a portion of your weighting might help your accuracy.

05-29-2012 , 01:51 PM

#19

gusmahler

Carpal \'Tunnel

Join Date: Jul 2005 Posts: 29,443

Quote:

Originally Posted by Chips Ahoy

OK, I get it. Thanks.

05-29-2012 , 02:01 PM

#20

reno expat

Carpal \'Tunnel

Join Date: Apr 2007 Posts: 10,449

i don't know the packages in excel, but i know stata has a command to let you generate pseudo-random numbers that have pre-determined statistical correlations. this might be something you want to play with to try to account for the error that state outcomes are not IID and one candidate doing better in michigan probably means that they are also doing better in ohio

you could also think about some sort of random effects model to capture between group differences and cluster states that you think will be correlated in how their outcomes move

05-29-2012 , 02:52 PM

#21

MrWookie

Don't Call Me Meredith

Join Date: Feb 2005 Posts: 94,159

Quote:

Originally Posted by pokerbobo

I recall Nate shooting this down as a myth.

05-29-2012 , 03:14 PM

#22

DblBarrelJ

2+2 Resident Enforcer

Join Date: Sep 2007 Posts: 13,857

I'm excited. I'm forecasting to make approximately $15K in profit on panic based "Obama's gonna take yer assault rifles, here's a $650 AK-47 I'll sell you for $3K" sales.

Either that or we get a mormon.

Nice.

05-29-2012 , 04:20 PM

#23

swinginglory

banned

Join Date: Nov 2010 Posts: 2,508

Quote:

Originally Posted by jackaaron2012

If I were to actually deem Ohio as an Obama state, it wouldn't be because of the current polling numbers, it would be because the population outside those three cities may not stand behind Romney (and thus, just not vote) as much as they would get out for a more polarizing figure.

You don't think Obama is a polarizing enough figure to get white religious conservatives in southern Ohio to get off their asses and vote?

A warm bucket of spit could be running on the R side and Obama would drive turn out.

05-29-2012 , 06:07 PM

#24

seattlelou

Carpal \'Tunnel

Join Date: Dec 2009 Posts: 31,584

I wonder if using economic data might be a "leading indicator" of the polling data and therefore be useful this far out from the election? Also, if you built a confidence interval just using national data how does that compare to your state by state analysis?

05-29-2012 , 10:28 PM

#25

goofball

Beat Nate Silver

Join Date: Oct 2003 Posts: 20,424

Quote:

Originally Posted by 13ball

Remember the graphs provided are for an aggregate poll of effective sample size 10,000 - iow imagine 10 polls were released on election day all showing a margin of 3 for Romney in Arizona, I think Romney losing arizona in that case would be a big big surprise.

If you look more at individual polls scale, the logistic gives someone who trails by 1 point on election day in a 500 n sample a 37% chance to win, and someone who trails by 3 points a 16% chance to win.

Page 1 of 4

First

1 2 3 4

Last

...

Page 1 of 4

First

1 2 3 4

Last