k-sys / covid-19

A collection of work related to COVID-19

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loading Patient Info no longer works with latest data

jqnatividad opened this issue · comments

The date validation logic below:

# Convert both to datetimes
patients.Confirmed = pd.to_datetime(
    patients.Confirmed, format='%d.%m.%Y')
patients.Onset = pd.to_datetime(
    patients.Onset, format='%d.%m.%Y')

# Only keep records where confirmed > onset
patients = patients[patients.Confirmed >= patients.Onset]

fails because of some invalid dates in the latest version of the data

Only the data up to May 13 works.

Further, the data file is also gzipped because of GH limits and the notebook needs to be updated to handle this.

There's a date in the Onset column entered as 31.04.2020, which doesn't exist - not super sure what it's supposed to be (perhaps 31.03.2020), but it's just two rows and you could drop them without it mattering much.