healthy_watershed_random_forest

Scripts generated for running random forest models for the Healthy Watersheds project at SCCWRP.

The workflow for this project consisted of:

(1) Assembling data from SCCWRP, external partners, and StreamCat.

(2) Ensuring all data was linked to a COMID.

(3) Building random forest models for ASCI, CSCI, CRAM, and RipRAM parameters using StreamCat landscape variables associated with human alteration.

(4) Using the built models to predict state-wide scores for all parameters.

(5) Assigning the scores for each parameter into four bins (very likely altered, likely altered, possibly altered, and likely unaltered).

(6) Plotting the scores on a map state-wide (and occasionally by watershed) using the shapefiles found at hw_datasets/NHD_Plus_CA/NHDPlus_V2_FLowline_CA.shp (assembled by Anne Holt).

Data -

ASCI and CSCI datasets assembled from the SMC database. CRAM datasets downloaded from the SMC database, but come from eCRAM/CEDEN. RipRAM datasets were provided by Kevin O'Connor at Moss Landing Marine Laboraties/Central Coast Wetlands Group. Perennial Stream Assessment Region data assembled by SCCWRP. StreamCat variables assembled from https://www.epa.gov/national-aquatic-resource-surveys/streamcat-dataset-0, using California datasets only.

Models -

Random forest models have been created for ASCI, CSCI, CRAM (physical and biotic structure indices), and RipRAM along with initial validation figures.

Files -

Broad categories of files in this project are detailed below: "XXX_rf.R" - random forest model + figures script for a given parameter "XXX_rf_data1.csv" - training data used to build the random forest model for a given parameter "XXX_rf_results.csv" - state-wide modeled values using the built random forest model for a given parameter (typically a very large file) "XXX_rf_results_summary.csv" - state-wide kilometers of NHD stream reaches classified in a certain category (typically a very small file) "XXX_lms.csv" - linear models of testing vs. predicted scores for a given parameter

Additional scripts help to compile the PSA Regional and StreamCat datasets to be used in each of the random forest modeling scripts.

hlowman / healthy_watershed_random_forest

healthy_watershed_random_forest

About

Languages