AaronWard / covidify

Covidify - corona virus report and dataset generator for python 📈 [no longer being updated]

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Negative values in US summary

creeble opened this issue · comments

I'm trying to figure out the semantics of the negative numbers in the US report in new_confirmed_cases (e.g., a=32, c=-184).

I assume these are retractions of some kind? I understand that they must be in the data, but I wonder if anyone knows the history of these.

commented

@creeble yeah i noticed that as well, i think since the 24th of february they have started messing up the entries (especially for diamon princess)

i brought it up in the issues but it was ignored, ill try work on a fix

Hmm, I'm not finding any negative numbers in the raw data sets from CSSEGISandData /
COVID-19 (other than longitude). Guess I should look at the code more closely to see what files it's reading.

commented

its calculated by getting the sum of a given day and deducting the sum of the previous day. For example:

new confirmed cases for 18th of February = sum(18th feb) - sum(17th feb)

What i assume is that on the days there are negative values there are certain rows that are being omitted in comparison to the day previous, meaning that the sum of cases in 17th is larger than the sum of cases on the 18th.

I think a fix for this could be to check the rows for the 18th and check to see if has all the rows for the 17th + new ones

commented

https://github.com/AaronWard/covid-19-analysis/projects/4#card-34590138

Currently working on this issue, i have fixed the problem for most of the countries being wrong now i am working on fixing the missing data problem.

Hi Aaron - is there a commit for this fix, or is it in the JHU repo?

commented

@creeble Not yet unfortunately, i haven't a lot of spare time between working from home, prepping for country lockdown etc.

I hope to have it fixed by the end of this week

commented

@creeble New release published

pip install the newest version 👍

Hey Aaron. Works, but apparently no US data because of the changes to format? What a fiasco for them to change it! I guess they're moving (back?) to per-county reporting in the US.
Any idea how much work that is?

commented

@creeble Maybe this issue will answer your problem

#28