TheBoatyMcBoatFace / pdc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CMS PDC Data Archive

Github repository dedicated to the automated daily retrieval of all CMS Provider Data Catalog (PDC) datasets, accompanied by storage handling through DoltHub.

Visit the project's Dolt repo: CMS PDC on DoltHub

License: AGPL-3.0

Overview

This repository performs daily GET requests to fetch datasets from the CMS Provider Data Catalog (PDC) and stores them systematically. The main objective is to maintain an up-to-date and accessible repository of CMS datasets that are crucial for healthcare analytics and public health informatics.

CMS Provider Data Themes

The CMS PDC covers various healthcare-related themes. Below are some of the key data themes available:

How It Works

The project utilizes Python scripts scheduled via crontab (or your custom scheduler) to pull data from the CMS API using specific dataset identifiers located in config/datasets.yml. The datasets are downloaded in CSV format and stored in a directory structure reflecting their respective themes, ensuring easy navigation and access.

Structure

  • download_datasets.py: Main Python script that orchestrates the downloading process.
  • config/datasets.yml: YAML file containing dataset identifiers and themes.
  • data/: Directory where downloaded datasets are stored by theme and dataset ID.

Prerequisites

To run the project scripts or contribute, you need:

  • Python 3.x
  • Dependencies from requirements.txt

Setup

  1. Clone the repository:

    git clone https://github.com/<your-github>/cms-pdc.git
    cd cms-pdc
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up the scheduler for daily runs or execute the script manually:

    python3 hippo/download_datasets.py

Contributions

Contributions to enhance the functionality, improve data extraction, or refine storage mechanisms are welcome! Please fork the repository, make your changes, and submit a pull request.

Data Usage

Please ensure that the use of data fetched through CMS PDC is compliant with the data use agreements and legal stipulations provided on the CMS Data Website.

About

License:GNU Affero General Public License v3.0


Languages

Language:Python 100.0%