Unusuala1l2e3x4 / Research-Spring2021

Investigating how carbon emissions, particulate matter, and climate variables/indices impact mortality from chronic respiratory disease. Working with pollutant, climate, mortality, population, and geographic datasets. Modeling with Random Forest regression.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Citation

A. He and T. Munasinghe, "Chronic Respiratory Disease: Risk Modeling Potential and Limitations," in 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 2021 pp. 1045-1053.

doi: 10.1109/BigData52589.2021.9672074

keywords: {microorganisms;temperature;pulmonary diseases;big data;water pollution;data models;spatiotemporal phenomena}

url's:

Research-Spring2021

[Undergraduate Research Project]

To get/construct GFED datasets:

  • Get original data:
  • Construct GFED data with appropriate datasets multiplied by cell area:
    • Copy all .hdf5 files from the original GFED data (from folder "GFED4s") into new folder "GFED4s_timesArea"
    • Run "code/multiply_by_area_gfed4s.py"

To get 2000-2018 Surface PM2.5 datasets (monthly, 0.01 deg resolution)

To get mortality data (death counts by county, month):

To write all data files (after downloading original data files from source):

  1. Run "adjust_sup_deaths_data_by_pop.py"
  2. Run "read_acag_pm2-5.py", "read_gfed4s.py"
  3. Run "write_county_month_pm2-5.py"
  4. Run, in any order:
    • Run "write_county_month_gfed.py"
    • Run "write_county_month_clim.py"
    • Run "write_county_month_median-income.py"
  5. To write AQI data (optional)
    • Run "impute_county_month_AQI.py"
    • Run "write_county_month_AQI_main.py"

If in any case, you encounter error message where a directory does not exist, create it in the path described

To run Random Forest and RFECV (recursive feature elimination and cross-validated selection)

  • To tune/test hyperparameters and static combinations of features, run "random_forest.py"
    • Set hyperparameters by editing "param_grid" variable
    • Set combinations of features by editing "columns_list" variable
  • To perform feature selection, run "random_forest_RFECV.py"
    • Adjust starting features by editing "columns" variable
    • Note: not for tuning hyperparameters due to runtime; hyperparameters ("param_grid" variable) can be set for a single iteration of RFECV

About

Investigating how carbon emissions, particulate matter, and climate variables/indices impact mortality from chronic respiratory disease. Working with pollutant, climate, mortality, population, and geographic datasets. Modeling with Random Forest regression.


Languages

Language:HTML 48.8%Language:ASP.NET 40.4%Language:Python 9.1%Language:CSS 1.5%Language:MATLAB 0.2%