Negative values in US summary
creeble opened this issue · comments
I'm trying to figure out the semantics of the negative numbers in the US report in new_confirmed_cases (e.g., a=32, c=-184).
I assume these are retractions of some kind? I understand that they must be in the data, but I wonder if anyone knows the history of these.
@creeble yeah i noticed that as well, i think since the 24th of february they have started messing up the entries (especially for diamon princess)
i brought it up in the issues but it was ignored, ill try work on a fix
Hmm, I'm not finding any negative numbers in the raw data sets from CSSEGISandData /
COVID-19 (other than longitude). Guess I should look at the code more closely to see what files it's reading.
its calculated by getting the sum of a given day and deducting the sum of the previous day. For example:
new confirmed cases for 18th of February = sum(18th feb) - sum(17th feb)
What i assume is that on the days there are negative values there are certain rows that are being omitted in comparison to the day previous, meaning that the sum of cases in 17th is larger than the sum of cases on the 18th.
I think a fix for this could be to check the rows for the 18th and check to see if has all the rows for the 17th + new ones
https://github.com/AaronWard/covid-19-analysis/projects/4#card-34590138
Currently working on this issue, i have fixed the problem for most of the countries being wrong now i am working on fixing the missing data problem.
Hi Aaron - is there a commit for this fix, or is it in the JHU repo?
@creeble Not yet unfortunately, i haven't a lot of spare time between working from home, prepping for country lockdown etc.
I hope to have it fixed by the end of this week
Hey Aaron. Works, but apparently no US data because of the changes to format? What a fiasco for them to change it! I guess they're moving (back?) to per-county reporting in the US.
Any idea how much work that is?