alex / nyt-2020-election-scraper

Home Page:https://alex.github.io/nyt-2020-election-scraper/battleground-state-changes.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in hurdle calculate

eebasso opened this issue · comments

Line 333 in print-battleground-state-changes
hurdle = (vote_diff + (votes_remaining * (candidate1_votes + candidate2_votes)) / votes) / (2 * votes_remaining) if votes_remaining > 0 else 0

This is the wrong formula for the two party hurdle because it takes into account the third party vote, which is irrelevant. This makes comparing the hurdle to the two party batch percentage an apples-to-oranges percentage. You should replace votes with the two party vote of candidate1_votes+candidate2_votes. This leads to a simplification of

hurdle = (vote_diff/votes_remaining + 1)/2 if votes_remaining > 0 else 0

For example, this is relevant for the current hurdle numbers for Trump in Arizona. With 110,925 votes remaining and a 20,102 BIden margin, Trump needs 59.1% of the remaining two party batch breakdown. Yet the current output says Trump's hurdle is just 58.3%. This is due to the error of including third party votes. Trump needs 58.3% of all the votes, but needs a higher amount of the two party vote.

Thank you so much for creating this.

Hey there! The original hurdle calculation actually used the formula you described. You can see the pull request where we changed it to the current iteration in the files for PR #200. The related discussion took place in issue #194.

There isn't really a perfect way to tackle this when there are three candidates splitting the vote, only two of whom are really relevant. But the current scheme more accurately reflects what most people expect, according to the feedback we've gotten thus far.

It's matter of taste then, but I think you should switch back to the original hurdle formula. As is, the hurdle is misleading because it makes Arizona look closer than it actually is. It lowers the hurdle for Trump and many people, including myself, compare that hurdle to the two party batch breakdown to see if Trump is above or below that threshold.

Perhaps you could have an extra column that gives the two-party % hurdle and keep the current column as is. I and many others would greatly appreciate that because it then becomes an apples-to-apples comparison with the two-party Batch Breakdown. It becomes easier to gauge the horse race.

The wording on the Hurdle is also confusing because it states "Note that third party candidates are not included in the batch breakdown. This is intentional." Don't you think this clarification should be on the Batch Breakdown instead? The Hurdle column clearly DOES include the third party votes.

Thinking about this more, I can see that trying to compute the hurdle ratio is impossible with a third party involved. Therefore one needs an approximation. I haven't work out all the math yet, but my guess is that the current formula is a better approximation when the third party vote is small. Is that right?

Aha! I found the correct formula. Both my old formula and the new formula were wrong. It should be as follows

hurdle = (vote_diff * votes / ((candidate1_votes + candidate2_votes)*votes_remaining) + 1 ) / 2 if votes_remaining > 0 else 0

Please take a look at #367, I think this is along similar lines to what you are suggesting.

I wrote up a document that explains the differences between the two formulas.
Correct Hurdle Formula.pdf

I shouldn't say that the current formula is wrong. My apologies. It gives the percentage that the trailing candidate needs to receive out of the total vote remaining. However, my proposed formula gives the the hurdle percentage of the remaining two party vote, which seems much more relevant to compare to the Batch Trend column. The Batch Trend is the running average of the Batch Breakdown percentages, which are only between the two parties.

It's the difference between calculating DT / (DT + DB + DL) and DT / (DT + DB) where T and B are the two party candidates and L is the third party and DT, DB, DL are the gains each has.

@eebasso Can you take a look at #367?

@eebasso This is a really great writeup (thanks for pulling out LaTeX for us)! We're asking you to look at #367 because we're fairly confident one of our devs landed on a very similar equation to what you've proposed here.

Thank you fractionalhare. I commented on #367. My confusion arises from why the current formula was chosen over the one I proposed. I would think that DT/(DT+DB) would be a better ratio to compare to the Batch Trend percentage compared to DT/(DT+DB+DL). I think that the Batch Trend percentage is showing the latter and not the former.