cengstro / watermelonSnowMap

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code and data for Satellite mapping of Watermelon Snow on Earth’s glaciers

Casey B. Engstrom and Lynne M. Quarmby

This repository contains all code and data used in the manuscript Satellite mapping of Watermelon Snow on Earth’s glaciers. All R code makes use of the here library, so all scripts should run on any machine without setting the working directory or changing the path to data.

scripts/

  • 00_GEE_js contains all Google Earth Engine javascript code used to process remote sensing data on Google's servers. These scripts can also be accessed at https://code.earthengine.google.com/?accept_repo=users/caseyengstrom/globalSnowAlgae

    • 00_myFunctions contains functions for filtering and masking Sentinel-2 images, and visualization parameters that are callable by other scripts
    • 01.1_glacierRegions2 was used to draw and export polygon regions for each global GLIMS glacier region * 01.2_glacierProducts2 converts the GLIMS collection from vector to raster
    • 02_labelMaxRgndComp was used to collect pixels for training data
    • 03_makeAlgaeCountMaps estimates global bloom frequency, i.e. number of years in which bloom occurred per pixel
    • 03_makeAlgaeMap5 creates regional probability and occurrence range maps
    • 04_makeRandomPts generates stratified random validation points for each regional probability composite
    • 05_exportGeotiffs exports regional occurrence range maps, and the individual Random Forest decision trees
    • 05_validateMap was used to view true-color images corresponding to random validation points and assign labels
    • 05_validationApp generates the interactive map, available at https://caseyengstrom.users.earthengine.app/view/watermelon-snow (fig. S4)
    • 05_viewTrainingPts can optionally be used to view true-color images corresponding to training data
    • 06_algaeArea1 computes percent cover per region from the occurrence range map
    • 06_demFrequencies estimates the distribution of Watermelon Snow per elevation, slope, and aspect (figs. S7 and S8)
    • 06_exportMapData exports percent cover per grid cell, used to generate Fig. 2
    • 06_interiorVsMaritimeArea compares percent cover on the interior and maritime sides of the Pacific Coast Ranges of North America
    • 06_makeCoastPolygon generates a vector coastline using Large Scale International Boundary Polygons, used to compare percent cover vs distance to coast (fig. S9)
    • 06_makeNorthAmericaInteriorVsMaritimeVectors generates watershed divides along the Pacific Coast Ranges of North America
    • 06_spatial_covs2 exports climate reanalysis statistics from ERA5 per global glacier grid cell
    • 07_s2_temporal estimates bloom intensity, bloom frequency, date of maximum bloom intensity vs. elevation in northwestern North America (figs. S5 and S11 and S14)
    • 07_snowmelt3 exports 5-day mosaic bloom intensity (Snow Redness Index) insolation data from DAYMET (fig. S12 and S13)
    • 09_elevationAlgaeMasks splits the occurrence range map in North America into high and low elevation quantiles per grid cell
    • 09_modis_covariates downloads climate reanalysis data including air temperature from 2000 to 2022
    • 09_modisRGND downloads annual mean maximum-composite MODIS red green normalized difference (aka Snow Redness Index) from 2000 to 2022
    • 031_trainAndEncodeClassifierTrees trains a Random Forest classifier, and saves the trees for reproducibility. Trees located at
      • projects/ee-caseyengstrom/assets/globalSnowAlgae/2_trainDat/trainclean2_trees for use in GEE
      • data/s2_classifier_map/training/finalRandomForestTrees.csv in this repository
  • 1_tune_classifier.R tunes Random Forest hyperparameters including thresholds

  • 2_algae_per_glacier.R estimates percent cover for each individual GLIMS glacier worldwide (table S4)

  • 2_area_error.R uses validation dataset to estimate confidence intervals around regional cover estimates (figs. S2 and S3 and tables S2 and S3)

  • 2_dem_frequencies.R summarizes the data generated by the GEE script 06_demFrequencies to visualize the distribution of Watermelon Snow per elevation, slope, and aspect (figs. S7 and S8)

  • 2_frequency.R estimates the mean frequency of bloom occurrence per global grid cell

  • 2_interior_maritime_area.R compares percent cover on the maritime and interior sides of the watershed divide along the Pacific Coast Ranges of North America

  • 2_niche_model.R compares global percent cover per grid cell with mean summer air temperature and other ERA5 climate reanalysis data (fig. S6)

  • 3_mapping.R generates maps showing percent cover of Watermelon Snow (Fig. 2)

  • 4_s2_temporal.R generates a map of bloom frequency in North America, a plot of annual biomass per region, and compares elevation with Sentinel-estimated date of maximum bloom intensity (figs. S5 and S11 and S14)

  • 4_snowmelt.R estimates annual bloom-albedo snowmelt in North America by applying smoothing splines to Sentinel-2 bloom intensity data, and applying the coefficients from Engstrom et al. 2022 to estimate the reduction in albedo and consequent snowmelt (figs. S12 and S13). This script uses data generated by 07_snowmelt3.

  • 5_modis_alt_data.R estimates trends in MODIS annual bloom intensity and trends in air temperature in North America using alternative datasets and processing methods (tables S6 and S7). This script uses data generated by 09_modisRGND and 09_modis_covariates

  • 5_modis.R estimates 2000 to 2022 trends in annual bloom intensity and trends in air temperature in North America using MODIS Terra MOD09GA.061 (Fig. 4 and fig. S16). This script uses data generated by 09_modisRGND and 09_modis_covariates.

  • 5_s2_modis_cor compares Sentinel and MODIS estimates of bloom intensity and date of maximum bloom intensity for regions in North America (fig. S15)

  • extract_mass_balance_data_jpg.R extracts data from previously published glacier mass balance studies. This data is used by 4_snowmelt.R to generate table S5.

  • make_ground_site_table.R generates table S1.

  • modis_trend_map_2.R is used to visualize the per-pixel MODIS Terra maximum annual relative bloom intensity data (fig. S17).

data/

  • engstrom_etal_2022_model_coefs/ contains the linear regression coefficients used to convert from units of Snow Redness Index (red green normalized difference) to albedo and biomass
  • field_sites.* contains the BC field site metadata used to generate table S1
  • glimsMeta.kml contains metadata for select glaciers in northwestern North America with high percent cover
  • modis/ 2000 to 2022 MODIS data and covariates
    • ersst.v5.pdo.dat.txt contains monthly mean Pacific Decadal Oscillation index data, downloaded from https://psl.noaa.gov/enso/data.html
    • julyMeanMaxRgndDoyStats100.csv contains Terra MOD09GA.061 annual bloom intensity estimates using only July data
    • julyMeanMaxRgndDoyStats100.csv contains Terra MOD09GA.061 date of maximum annual bloom intensity using only July data
    • meanMaxRgndDoyStats100.csv contains Terra MOD09GA.061 date of maximum annual bloom intensity estimates using July and August data
    • meanMaxRgndDoyStatsAqua100.csv contains Aqua MYD09GA.061 date of maximum annual bloom intensity estimates using July and August data
    • meanMaxRgndRegionAqua8100.csv contains Aqua MYD09A1.061 estimates of maximum annual bloom intensity using July to August data
    • meanMaxRgndRegionAquaDay100.csv contained Aqua MYD09GA.061 estimates of maximum annual bloom intensity using July to August data
    • meanMaxRgndRegionTerra8100.csv contains Terra MOD09A1.061 estimates of maximum annual bloom intensity using July to August data
    • meanMaxRgndRegionTerra8100HighElevation.100.csv contains Terra MOD09A1.061 estimates of maximum annual bloom intensity using July to August data within the High elevation mask generated by 09_elevationAlgaeMasks
    • meanMaxRgndRegionTerra8100LowElevation.100.csv contains Terra MOD09A1.061 estimates of maximum annual bloom intensity using July to August data within the Low elevation mask generated by 09_elevationAlgaeMasks
    • meanMaxRgndRegionTerraDay100.csv contains Terra MOD09GA.061 estimates of maximum annual bloom intensity using July to August data
    • meanMaxRgndRegionTerraDay100HighElevation.csv contains Terra MOD09GA.061 estimates of maximum annual bloom intensity using July to August data within the High elevation mask generated by 09_elevationAlgaeMasks
    • meanMaxRgndRegionTerraDay100LowElevation.csv contains Terra MOD09GA.061 estimates of maximum annual bloom intensity using July to August data within the Low elevation mask generated by 09_elevationAlgaeMasks
    • nino34.long.anom.data.txt contains monthly mean Nino-3.4 index data, downloaded from https://psl.noaa.gov/enso/data.html
    • oni.data.txt contains yearly monthly mean Oceanic Nino Index index data, downloaded from https://psl.noaa.gov/enso/data.html
    • regionalDaymetStats10000.csv contains DAYMET climate reanalysis summaries for each North American glacier region at 10 km resolution
    • regionalDaymetStats1000.csv contains DAYMET climate reanalysis summaries for each North American glacier region at 1 km resolution
    • regionalEraStats1000.csv contains ERA5 climate reanalysis summaries for each North American glacier region at 1 km resolution
    • timeSeries8.csv times series of Terra MOD09A1.061 (8-day quality mosaic) Snow Redness Index
    • timeSeries.csv time series of Terra MOD09GA.061 (daily) Snow Redness Index
  • s2_classifier_map/ data used in spatial analysis of Sentinel-2 Watermelon Snow occurrence range map
    • algae_per_glacier/
      • algae_pix_count_per_glacier.csv number of bloom pixels per GLIMS glacier id
      • gids_north_america_gt10.csv list of GLIMS glacier IDs with more than 10 percent algal occurrence range cover
      • glacier_pix_count.csv number of glacier pixels per GLIMS glacier id
    • area/
      • f05_map_data/ contains occurrence range percent cover per grid cell, occurrence range generated with region-specific $F_{0.5}$ thresholds
      • area/*.kml contains occurrence range percent cover per grid cell, occurrence range generated with using region-specific $F_1$ thresholds, used to make Fig. 2
      • interiorVsMaritime.csv contains percent cover for the interior and maritime sides of the Pacific Coast Ranges of North America
      • per_region_F1/ not used in final analysis, contains an earlier version of area/*.kml
      • table_s2.csv contains table S3, in csv format
      • the following csvs were used to make fig. S3:
        • thrsGF1algaeAreaGrNoSheet.csv contains Greenland glaciers and ice caps percent cover estimates generated with global $F_1$ threshold
        • thrsGF1Area.csv contains per-region percent cover estimates generated with global $F_1$ threshold
        • thrsRF1algaeAreaGrNoSheet.csv contains Greenland glaciers and ice caps percent cover estimates generated with region-specific $F_1$ thresholds
        • thrsRF1Area.csv contains per-region percent cover estimates generated with region-specific $F_1$ thresholds
        • thrsRF5algaeAreaGrNoSheet.csv contains Greenland glaciers and ice caps percent cover estimates generated with region-specific $F_{0.5}$ thresholds
        • thrsRF5Area.csv contains per-region percent cover estimates generated with region-specific $F_{0.5}$ thresholds
    • covariates/
      • coastCovsPlusDist25.kml contains the percent cover vs distance to coast data used to generate fig. S9
      • covariates50.kml contains ERA5 climate reanalysis data used to generate fig. S6
      • demHistAlgae.csv contains Watermelon Snow elevation, slope, and aspect data used to generate figs. S7 and S8
      • demHistGlacier.csv contains Glacier elevation, slope, and aspect data used to generate figs. S7 and S8
    • error.csv contains table S3 in csv format
    • freq_histograms/ contains global Sentinel-2 bloom frequency data per region
    • geotiffs/ contains the Watermelon Snow probability composite and occurrence range map, broken into multiple geotiffs (the same as seen in the interactive map)
    • glacierRegionsV15.kml the global polygons used to delineate the major glaciated regions on Earth
    • global_bloom_freq_histograms.csv contains all csvs found in freq_histograms/, contained in a single csv
    • regions_plus_thresholds* shapefile data containing global regions, with global $F_1$ and region-specific $F_{0.5}$ and $F_1$ threshold attributes
    • training/
      • combined/ the final training data used to train the Random Forest classifier, split into a single file per training image, generated by 02_labelMaxRgndComp
      • finalRandomForestTrees.csv the final Random Forest decision trees that were applied to pre-masked and pre-processed Sentinel-2 L2A data to generate the global Watermelon Snow occurrence range map
      • meta_trainclean2.csv metadata for the final training data
      • trainclean2.csv the final training data used to train the Random Forest classifier, i.e. all data from combined/ in a single csv
    • training_region_name_dict.csv dictionary linking the training regions and glacierRegionsV15.kml
    • validation/ contains the individual files which were combined to generate the final validation data set
      • final_combined/validation_final.csv the final validation data set, as seen in the interactive map
  • s2_temporal/ Sentinel-2 temporal data, including annual statistics
    • 382_2016_3253_MOESM1_ESM.xlsx not included in final published analysis, supplemental data from Medwedeff and Roe (2017)
    • algalAreas.csv annual occurrence range area per North American region
    • annMaxRgndAlgaemask.csv Sentinel-2 mean annual maximum-composite bloom intensity data used to generate fig. S11
    • annMaxRgnd.csv not used in final analysis
    • annMaxRgndGlaciermask.csv Sentinel-2 mean annual maximum-composite bloom intensity data
    • freqAreas.csv pixel count of bloom frequency per North American region
    • freqMapPixCountGt100.kml bloom frequency data used to generate fig. S5
    • insolTimeSeriesGlaciers.csv daily solar radiation data from DAYMET averaged across select GLIMS glacier surfaces in northwestern North America
    • insolTimeSeriesGrid.csv daily solar radiation data from DAYMET averaged per grid cell in northwestern North America
    • insolTimeSeriesRegions.csv daily solar radiation data from DAYMET averaged across the five major glaciated regions in northwestern North America
    • maxRgndDoyPredictorsStrat.csv DEM coefficients used to generate fig. S14
    • meanMaxRgndDoyStats.csv mean date of maximum annual bloom intensity per North American region, per year
    • northAmerica20kGrid.kml not used in final analysis
    • rgndTimeSeriesGlaciers.csv Snow Redness Index (red green normalized difference) time series data averaged across select GLIMS glaciers in North America
    • rgndTimeSeriesGrid.csv Snow Redness Index (red green normalized difference) time series data used to generate Fig. 3
    • rgndTimeSeriesRegions.csv Snow Redness Index (red green normalized difference) time series data averaged per North American region
    • summer_mass_balance_data.csv summer mass balance data from previously published studies used to generate table S5

About


Languages

Language:R 72.0%Language:Roff 28.0%