sociepy / covid19-vaccination-subnational

🌍💉 Global COVID-19 vaccination data at the regional level.

Home Page:https://sociepy.org/covid19-vaccination-subnational

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adding total_vaccinations and population field at a national level

sanyam-git opened this issue · comments

Currently the country-wise latest and all API have the following structure :

{
    "country": "India",
    "country_iso": "IN",
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Two more fields can be added : total_vaccinations and population as such:

  • total_vaccination :

    • One can loop over the data for all the regions of a country and get cumulative, but I think it will be better if it can be provided pre-calculated.
    • Another reason for this is some countries (I'm only aware of India in this case currently, but it is fairly possible that it maybe the case somewhere else also), are adding some vaccinations under the heading of Miscellaneous, so this can't be accounted to any region and should be reflected in the national total.
  • population :
    will be helpful in normalizing data as per capita. (I'm not sure about what source should be used here, maybe https://www.worldometers.info/world-population/)

The updates structure as :

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Hi @sanyam-git,
Thanks for your proposal! It could be a nice-to-have feature.

The reason for not adding these fields so far was because the https://github.com/owid/covid-19-data project already does. But still, we could give it a try so we can have this info all in one API.

Your points regarding how-to obtain the aggregated national values are quite relevant, as simply iterating over the available regional JSON files would not work. Some countries add "Misc", "Others" fields, which are removed in the process of generating the API.

Data update process

To give you an overview, the data update is performed with the script update_all, which sequentially executes the following steps:

  1. Update country regional data. For each country do:
    1.1. Scrape each country's source link and get the raw data.
    1.2. Process the raw data (change column names, standardize region names & ISO codes, etc.)
    1.3. Export the processed data as a CSV file to data/countries directory.
  2. Merge all country generated CSV files into a single vaccinations.csv file.
  3. Add population-related metrics to vaccinations.csv file (e.g. total_vaccinations_per_100, etc.).
  4. Generate API files using each country's CSV file
  5. Update documentation with changes (e.g. update README.md)

Note that in step 1.2 all special regions like "Misc", "Others" are discarded. Hence, recovering these at step 4. would be quite complex at the moment.

Some ideas:

API proposals

Proposal 1 (yours)

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Proposal 2

Having total_vaccinations_per_100 instead population.

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "total_vaccinations_per_100":0.5117,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Proposal 3

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "total_vaccinations_per_100":0.5117,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

I would probably go for proposal 2 and leave the population field out. My reasoning is that:

  • Having all three fields would be redundant.
  • total_vaccinations_per_100 would probably be more interesting than population in the context of covid19 vaccinations.

Please let me know what you think! 😄

I'll be adding total_vaccinations_per_100 to individual region JSON files.

@lucasrodes, Thanks for giving a detailed info of the inner working of project. Here's my take :

  • others/misc data : I agree with you, it seems to be cumbersome to retrieve this data after the initial two steps. The other option suggested by you seems to more feasible to me currently, that retrieve national total from some some other reliable source (like https://github.com/owid/covid-19-data).

    Another thing you can do for adding the misc data is, don't know if its the best way or not. But you can calculate the other/misc vaccination numbers by taking the difference of national total (from some other source) and total of region-wise cases.

  • Population : I saw that you have already added the per capita field to all individual regions, that's really looking great !

I think that adding the total_vaccinations and total_vaccinations_per_100 at national level also will be quite helpful, what do you think ? (as mentioned above by you the data is available at owid, but it will be better if one can get it all at one place)

Keep the good work :) 👍

Hi @sanyam-git,
Yes, just added per 100-capita metrics recently to region files. I Will think about how to add such info at the national level, shouldn't be difficult. I Will get back to this thread once I get to something more concrete.

Thanks for your contribution 😄 !