Loading Patient Info no longer works with latest data
jqnatividad opened this issue · comments
Joel Natividad commented
The date validation logic below:
# Convert both to datetimes
patients.Confirmed = pd.to_datetime(
patients.Confirmed, format='%d.%m.%Y')
patients.Onset = pd.to_datetime(
patients.Onset, format='%d.%m.%Y')
# Only keep records where confirmed > onset
patients = patients[patients.Confirmed >= patients.Onset]
fails because of some invalid dates in the latest version of the data
Only the data up to May 13 works.
Further, the data file is also gzipped because of GH limits and the notebook needs to be updated to handle this.
Tushar Chandra commented
There's a date in the Onset
column entered as 31.04.2020
, which doesn't exist - not super sure what it's supposed to be (perhaps 31.03.2020
), but it's just two rows and you could drop them without it mattering much.