sociepy / covid19-vaccination-subnational

🌍💉 Global COVID-19 vaccination data at the regional level.

Home Page:https://sociepy.org/covid19-vaccination-subnational

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not adding dates with zero vaccinations

sanyam-git opened this issue · comments

I have observed from comparing India.csv and state_timeline.csv that the script is leaving the date with zero vaccinations (I've checked and its seems it is the case for all other countries also).
For example : On 20th January 202, the union territory of AN in India had zero vaccination does administered so that date is not present in India.csv.

I'm relatively new in this stuff, so please don't mind if I'm wrong here. Will not this create issues when using the API to directly plot any visualizations or using the data for analysis directly ?

The decision behind this was to only add entries whenever there are new values. In particular, this call to keep_min_date is the responsible:

def keep_min_date(df):
df = df.copy()
cols = df.columns
# Remove NaNs
count_cols = [col for col in COLUMNS_INT if col in cols]
df.loc[:, count_cols] = df.loc[:, count_cols].fillna(-1).astype(int)
# Goup by
df = df.groupby(
by=[col for col in df.columns if col != "date"]
).min().reset_index()
# Bring NaNs back
df.loc[:, count_cols] = df.loc[:, count_cols].astype("Int64").replace({-1: pd.NA})
return df.loc[:, cols]

In the CSV files, I think this behavior makes sense. However, in the API files, I agree that this may cause some issues.

To this end, I'd say we could modify the update_api_v1.py script to fill these gaps, potentially adding a new field like total_vaccinations_daily to remark that there were 0 vaccinations that day and data was copied from the prior day.

Let me know what you think and thanks for your feedback

I think it will be better to account for the zero_vaccination dates both in JSON and CSV. (specially in JSON as you mentioned) As some people prefer to use CSV over JSON and it is good to keep both in similar structure.

Regarding adding total_vaccinations_daily, yeah I think it can be helpful. (can be kept in the enhancement list)
Thanks for the reply :)

Thanks for your comment. Some notes:

  • Some countries do not provide information on some days, question here is, should we assume that the number of vaccinations was zero? My opinion is that this should be treated as missing data. If these days were to be added, I'd suggest adding a flag stating that his entry was recovered from previous entries.
  • However, if source data specifically states that there were zero vaccinations, probably these should be added, as this wouldn't count as missing data. To ensure this is reliable, some simple exploration in source data should be done.

What do you think? @sanyam-git