samcarlen / toy-us-election-simulator

A very simple model for forecasting elections with polls and demographic data

Home Page:https://twitter.com/gelliottmorris/status/1257331350618726400?s=20

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Toy US election simulator

G. Elliott Morris @gelliottmorris

This is just a simple election simulator based on national and state-level polls. The code in this repo will generate the graphs and statistics I shared here.

My aim for this code is to help shed some light on basic methods for aggregating national and state polls, inferring electoral standings in states without a lot of data and simulating what might happen in the electoral college if polls lead us astray. None of this should be considered an official election forecast, or really even a good one. I bet you’d have better-than-replacement-level rates of success with it, but I only wrote it for a fun coding exercise and to show people how this sort of program works—so act accordingly.

This caveat being addressed, I will concede that I do think this model will provide us with some interesting material as the election cycle progresses, so I’ve set up the model to update the maps and tables at the bottom of this document throughout the day using GitHub Actions. You can check back here regularly to see how the race is changing.

Technical notes

The file scripts/main_poll_simulator.R runs a series of models to forecast the presidential election using national and state-level polls. The first step is to average available polls fielded over the last two months. That average is weighted by each poll’s sample size. If all states had plenty of polls, this model would be easy; we would move on to simulating many different “trial” elections by generating errors from the appropriate distributions. Alas, not all states will be polled adequately, so we turn to an intermediate step.

The second step is to predict what polls would say if pollsters surveyed neglected states. We can regress Biden’s observed vote margin in each state on a series of demographic variables in each. I use: Clinton’s margin in the 2016 election; the share of adults who are black; the share of adults with a bachelor’s degree or higher; the share of adults who are Hispanic or another non-white, non-black race; the median age of voters in a state; the share of adults who are white evangelicals; the average number of people who live within five miles of any given resident; the share of adults who are white and the share of adults who are white without a college degree. Any regular statistical model would struggle to avoid being over-fit by all these variables, so I use stepwise variable selection via AIC and elastic net regularization with a linear model trained using leave-one-out cross-validation. In states with polls, the predictions from the regression model are given a weight equal to that of a poll with an average sample size and averaged with the raw polling data. In states without polls, the final “polling average” is just the regression prediction.

Because polls are not perfect predictions of voting behavior, the final step is to simulate many tens of thousands of different “trial” election, in each one generating (a) national polling error, (b) a regional polling error and (c) a state-level polling error. These errors are disaggregated from the observed historical root-mean-square error of election polls using a error sum-of-squares formula that I cribbed from Nate Silver. This is equivalent to saying that polling error is assumed to be correlated nationally and regionally, but also have state-specific components that aren’t shared across geographies. We could be more complex about this—–perhaps someone will submit a pull request to generate correlated state-level errors using mvrnorm, for example—but this works for my illustrative purposes here.

Odds and ends

A note on forecasting: The reason this is a “toy” model is because it does not attempt to project movement in the polls between whatever day it runs and election day. Instead, it just treats the polls as uncertain readings of the future, assuming no change in means. But this is an empirically flawed assumption. We know from history that polls during and after conventions tend to over-state the party that most recently nominated a candidate. A true forecasting model will adjust for these historical patterns and project that the favored candidate’s election-day polling margin will be smaller than it is on the model run date. This is yet another reason you should treat this analysis with a hefty dose of skepticism—at least until election day…

A note on polls: The purpose of this analysis is to determine what we know now from the polls. But polls often err in predicting elections. It is probably better to combine general election polls with other indicators of election outcomes, such as the state of the economy or presidential approval ratings. Fancier election models will do so. This is yet another reason not to squint at the estimates here.

A final reminder: this is not an official election forecast. The purpose of this repo is to help people understand how these forecasts work, and to provide some forecasters with code to improve their methods.

With all that out of the way, I guess we can proceed…

Automated report:

refresh_readme

These graphs are updated hourly with new polls.

Last updated on November 03, 2020 at 06:10 PM EST.

National polling average and popular vote prediction

Joe Biden’s margin in national polls is 9.4 percentage points.

His margin implied by state-level polls and the demographic regression is 8.6 percentage points.

This chart draws a trend for Biden’s implied national margin and plots individual national polls alongside it. His national margin implied by state polls will not always match the raw average of national polls.

State polling averages and vote prediction

The polling average in each state:

In map form….

In table form…

The twenty most competitive states:

State Biden margin, uncertainty interval (%) State Biden margin, … (%)
NH 10 [-1, 21] OH 0 [-11, 11]
MI 9 [-2, 20] TX -1 [-11, 10]
MN 9 [-2, 20] IA -1 [-12, 10]
WI 8 [-2, 19] AK -4 [-15, 7]
PA 7 [-4, 17] MT -5 [-16, 6]
NV 7 [-4, 18] UT -7 [-17, 4]
AZ 3 [-8, 14] SC -7 [-18, 4]
FL 3 [-7, 14] MO -7 [-18, 4]
NC 2 [-9, 13] NE -7 [-18, 3]
GA 1 [-9, 12] IN -10 [-21, 0]

The rest of the states:

State Biden margin, uncertainty interval (%) State Biden margin, … (%)
DC 75 [64, 85] ME 12 [1, 23]
MA 33 [23, 44] CO 12 [2, 23]
CA 32 [21, 42] KS -11 [-21, 0]
HI 30 [20, 41] SD -13 [-24, -2]
VT 28 [17, 38] MS -14 [-25, -3]
NY 27 [16, 38] TN -14 [-25, -3]
MD 27 [16, 38] ID -14 [-25, -4]
CT 24 [13, 35] LA -15 [-25, -4]
RI 21 [10, 31] ND -16 [-27, -6]
NJ 21 [10, 32] KY -18 [-29, -7]
WA 20 [9, 31] AL -18 [-29, -7]
IL 19 [8, 29] AR -22 [-32, -11]
DE 17 [6, 27] OK -22 [-33, -12]
OR 14 [3, 24] WY -24 [-35, -13]
VA 12 [1, 23] WV -26 [-37, -15]
NM 12 [1, 23]

State polling averages and vote prediction, over time:

In our simple polling model, Joe Biden’s projected election-day vote margin in any state is equal to a combination of his polling average and a projection based on the relationship between demographics and the polls in other states. Accordingly, the chart below shows our estimate of his support according to the polls and our demographic regression on any given day—and it also represents our projection for his final election-day vote. (In other words, we don’t forecast any movement in the race between now and election day. Although it is naive to assume the race will remain static, it suits our educaitonal purposes with this model.)

State win probabilities

The odds that either candidate wins a state they’re favored in, given the polling error:

In map form…

State win probabilities, over time:

(Just for key states.)

Tipping-point states

The states that give the winner their 270th electoral college vote, and how often that happens:

State Tipping point chance (%) State Tipping point chance (%)
FL 17.2 DE 0.1
PA 13.6 MO 0.1
MI 9.0 MT 0.1
AZ 7.0 NE 0.1
WI 7.0 NJ 0.1
MN 6.2 RI 0.1
NC 6.0 SC 0.1
TX 5.9 UT 0.1
GA 5.2 WA 0.1
OH 5.0 CA 0.0
VA 4.1 CT 0.0
NV 3.7 ID 0.0
CO 2.1 IN 0.0
NH 1.8 KS 0.0
NM 1.6 MA 0.0
OR 1.1 MD 0.0
IA 1.0 MS 0.0
ME 1.0 NY 0.0
IL 0.3 SD 0.0
AK 0.2 TN 0.0

Electoral college outcomes

The range of electoral college outcomes:

Chance of winning the election, over time

The divide between the electoral college and popular vote

The chance that one party wins the national popular vote, but loses the electoral college majority:

Chance (%)
Democrats win the popular vote and electoral college 96
Democrats win the popular vote, but Republicans win the electoral college 3
Republicans win the popular vote and electoral college 1
Republicans win the popular vote, but Democrats win the electoral college 0

The overall probability that Joe Biden win the national popular vote is 99.26%. The overall probability that Joe Biden win the electoral college majority is 95.92%.

The gap between the popular vote and tipping-point state

We can quantify either party’s edge as the average across simulations of Joe Biden’s margin in the tipping-point state and his margin nationally:

On average, the tipping point state is 2.3 percentage points to the right of the nation as a whole.

But the actual divide could take on a host of other values:

Changes in state averages relative to the national margin

This map shows where Biden and Trump have gained or lost ground since 2016, relative to their gains/losses nationally:

Endmatter

I hope you learned something. You can find me on Twitter at @gelliottmorris or my personal website at gelliottmorris.com.

This content is licensed with the MIT license.

About

A very simple model for forecasting elections with polls and demographic data

https://twitter.com/gelliottmorris/status/1257331350618726400?s=20

License:MIT License


Languages

Language:R 100.0%