jalbertbowden / covid19-cms-utilities

Collection of scripts that run dataops on cms.gov's COVID-19 Nursing Home dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

covid19-cms-utilities

Collection of Python scripts for wrangling with cms.gov's Nursing Home Dataset, and normalizing the reported facility names.

NOTE: This initally included processes for normalizing the facility names that states report. Functionality for that is still baked in, but is not in use.

Starting Off

Check COVID-19 Nursing Home Data landing page for the latest dataset date release.
That value, in international format (YYYY-MM-DD) needs to be added to get-cms-nursing.py as the value for file_date.
Open up cms-1.py and cms-2.py and update the cms_date with the value, as well as append the value to the cms_reporting_dates list.

Once these have been updated, run get-cms-nursing-py to download the latest CMS dataset, dividing it into separate files by state, and then dividing those files by date. Then run cms-1.py and cms-2.py, both of which normalize the facility names and output minimized, normalized data files. Output from cms-1.py has -min appended to the file name, output from cms-2.py has -normalized appended after the -min in the file name.

From start to finish, this process takes some time to run, clocked in @ almost an hour in its last run.

The initial CMS dataset files are too big for GitHub's default, so be sure to compress them and delete the original file before committing any work.

About

Collection of scripts that run dataops on cms.gov's COVID-19 Nursing Home dataset.

License:Mozilla Public License 2.0


Languages

Language:Python 100.0%