sonthuybacha / hydroDL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This code contains deep learning code used to modeling hydrologic systems, from soil moisture to streamflow, from projection to forecast.

Citations

If you find our code to be useful, please cite the following papers:

Feng, DP, K. Fang and CP. Shen, [Enhancing streamflow forecast and extracting insights using continental-scale long-short term memory networks with data integration], Water Resources Reserach (2020), https://doi.org/10.1029/2019WR026793

Fang, K., CP. Shen, D. Kifer and X. Yang, [Prolongation of SMAP to Spatio-temporally Seamless Coverage of Continental US Using a Deep Learning Neural Network], Geophysical Research Letters, doi: 10.1002/2017GL075619, preprint accessible at: arXiv:1707.06611 (2017) https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2017GL075619

Shen, CP., [A trans-disciplinary review of deep learning research and its relevance for water resources scientists], Water Resources Research. 54(11), 8558-8593, doi: 10.1029/2018WR022643 (2018) https://doi.org/10.1029/2018WR022643

Major code contributor: Kuai Fang (PhD., Penn State), and smaller contribution from Dapeng Feng (PhD Student, Penn State)

A new release is expected in early July, 2020, together with video code walkthrough. Computational benchmark: training of CAMELS data (w/ or w/o data integration) with 671 basins, 10 years, 300 epochs, in ~1 hour with GPU.

Example

Two examples with sample data are wrapped up including

A demo for temporal test is here

License

Non-Commercial Software License Agreement

By downloading the hydroDL software (the “Software”) you agree to the following terms of use: Copyright (c) 2020, The Pennsylvania State University (“PSU”). All rights reserved.

  1. PSU hereby grants to you a perpetual, nonexclusive and worldwide right, privilege and license to use, reproduce, modify, display, and create derivative works of Software for all non-commercial purposes only. You may not use Software for commercial purposes without prior written consent from PSU. Queries regarding commercial licensing should be directed to The Office of Technology Management at 814.865.6277 or otminfo@psu.edu.
  2. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
  3. This software is provided for non-commercial use only.
  4. Redistribution and use in source and binary forms, with or without modification, are permitted provided that redistributions must reproduce the above copyright notice, license, list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Database description

Database Structure

├── CONUS
│   ├── 2000
│   │   ├── [Variable-Name].csv
│   │   ├── ...
│   │   ├── timeStr.csv
│   │   └── time.csv
│   ├── ...
│   ├── 2017
│   │   └── ...
│   ├── const
│   │   ├── [Constant-Variable-Name].csv
│   │   └── ...
│   └── crd.csv
├── CONUSv4f1
│   └── ...
├── Statistics
│   ├── [Variable-Name]_stat.csv
│   ├── ...
│   ├── const_[Constant-Variable-Name]_stat.csv
│   └── ...
├── Subset
│   ├── CONUS.csv
│   └── CONUSv4f1.csv
└── Variable
    ├── varConstLst.csv
    └── varLst.csv

1. Dataset folders (CONUS , CONUSv4f1)

Data folder contains all data including both training and testing, time-dependent variables and constant variables. In example data structure, there are two dataset folders - CONUS and CONUSv4f1. Those data are saved in:

  • year/[Variable-Name].csv:

A csv file of size [#grid, #time], where each column is one grid and each row is one time step. This file saved data of a time-dependent variable of current year. For example, CONUS/2010/SMAP_AM.csv is SMAP data of 2002 on the CONUS.

Most time-dependent varibles comes from NLDAS, which included two forcing product (FORA, FORB) and three simulations product land surface models (NOAH, MOS, VIC). Variables are named as [variable]_[product]_[layer], and reference of variable can be found in NLDAS document. For example, SOILM_NOAH_0-10 refers to soil moisture product simulated by NOAH model at 0-10 cm.

Other than NLDAS, SMAP data are also saved in same format but always used as target. In level 3 database, there are two SMAP csv files which are only available after 2015: SMAP_AM.csv and SMAP_PM.csv.

-9999 refers to NaN.

  • year/time.csv & timeStr.csv

Dates csv file of current year folder, of size [#date]. time.csv recorded Matlab datenum and timeStr.csv recorded date in format of yyyy-mm-dd.

Notice that each year start from and end before April 1st. For example data in folder 2010 is actually data from 2010-04-01 to 2011-03-31. The reason is that SMAP launched at April 1st.

  • const/[Constant Variable Name].csv

csv file for constant variables of size [#grid].

  • crd.csv

Coordinate of all grids. First Column is latitude and second column is longitude. Each row refers a grid.

2. Statistics folder

Stored statistics of variables in order to do data normalization during training. Named as:

  • Time dependent variables-> [variable name].csv
  • Constant variables-> const_[variable name].csv

Each file wrote four statistics of variable:

  • 90 percentile
  • 10 percentile
  • mean
  • std

During training we normalize data by (data - mean) / std

3. Subset folder

Subset refers to a subset of grids from the complete dataset (CONUS or Global). For example, a subset only contains grids in Pennsylvania. All subsets (including the CONUS or Global dataset) will have a [subset name].csv file in the Subset folder. [subset name].csv is wrote as:

  • line 1 -> root dataset
  • line 2 - end -> indexs of subset grids in rootset (start from 1)

If the index is -1 means all grid, from example CONUS dataset.

4. Variable folder

Stored csv files contains a list of variables. Used as input to training code. Time-dependent variables and constant variables should be stored seperately. For example:

  • varLst.csv -> a list of time-dependent variables used as training predictors.
  • varLst.csv -> a list of constant variables used as training predictors.

About


Languages

Language:Jupyter Notebook 95.8%Language:Python 4.2%