Not adding dates with zero vaccinations
sanyam-git opened this issue · comments
I have observed from comparing India.csv and state_timeline.csv that the script is leaving the date with zero vaccinations (I've checked and its seems it is the case for all other countries also).
For example : On 20th January 202, the union territory of AN in India had zero vaccination does administered so that date is not present in India.csv.
I'm relatively new in this stuff, so please don't mind if I'm wrong here. Will not this create issues when using the API to directly plot any visualizations or using the data for analysis directly ?
The decision behind this was to only add entries whenever there are new values. In particular, this call to keep_min_date
is the responsible:
covid19-vaccination-subnational/src/covid_updater/utils.py
Lines 36 to 49 in b12cf50
In the CSV files, I think this behavior makes sense. However, in the API files, I agree that this may cause some issues.
To this end, I'd say we could modify the update_api_v1.py script to fill these gaps, potentially adding a new field like total_vaccinations_daily
to remark that there were 0 vaccinations that day and data was copied from the prior day.
Let me know what you think and thanks for your feedback
I think it will be better to account for the zero_vaccination dates both in JSON and CSV. (specially in JSON as you mentioned) As some people prefer to use CSV over JSON and it is good to keep both in similar structure.
Regarding adding total_vaccinations_daily
, yeah I think it can be helpful. (can be kept in the enhancement list)
Thanks for the reply :)
Thanks for your comment. Some notes:
- Some countries do not provide information on some days, question here is, should we assume that the number of vaccinations was zero? My opinion is that this should be treated as missing data. If these days were to be added, I'd suggest adding a flag stating that his entry was recovered from previous entries.
- However, if source data specifically states that there were zero vaccinations, probably these should be added, as this wouldn't count as missing data. To ensure this is reliable, some simple exploration in source data should be done.
What do you think? @sanyam-git