This repository contains tools for preparing data to run a next gen simulation using NGIAB. The tools allow you to select a catchment of interest on an interactive map, choose a date range, and prepare the data with just a few clicks!
- What does this tool do?
- Requirements
- Installation and Running
- Development Installation
- Usage
- CLI Documentation
This tool prepares data to run a next gen simulation by creating a run package that can be used with NGIAB. It picks default data sources, the v20.1 hydrofabric and nwm retrospective v3 forcing data.
- This tool is officially supported on macOS or Ubuntu (tested on 22.04 & 24.04). To use it on Windows, please install WSL.
- GDAL needs to be installed.
- The 'ogr2ogr' command needs to work in your terminal.
sudo apt install gdal-bin
will install gdal and ogr2ogr on ubuntu / wsl
# optional but encouraged: create a virtual environment
python3 -m venv env
source env/bin/activate
# installing and running the tool
pip install ngiab_data_preprocess
python -m map_app
The first time you run this command, it will download the hydrofabric and model parameter files from Lynker Spatial. If you already have them, place conus.gpkg
and model_attributes.parquet
into modules/data_sources/
.
Click to expand installation steps
To install and run the tool, follow these steps:
- Clone the repository:
git clone https://github.com/CIROH-UA/NGIAB_data_preprocess cd NGIAB_data_preprocess
- Create a virtual environment and activate it:
python3 -m venv env source env/bin/activate
- Install the tool:
pip install -e .
- Run the map app:
python -m map_app
Running the command python -m map_app
will open the app in a new browser tab. Alternatively, you can manually open it by going to http://localhost:5000 with the app running.
To use the tool:
- Select the catchment you're interested in on the map.
- Pick the time period you want to simulate.
- Click the following buttons in order:
- Create subset gpkg
- Create Forcing from Zarrs
- Create Realization
Once all the steps are finished, you can run NGIAB on the folder shown underneath the subset button.
Note: When using the tool, the output will be stored in the ./output/<your-first-catchment>/
folder. There is no overwrite protection on the folders.
Click to expand CLI documentation
-h
,--help
: Show the help message and exit.-i INPUT_FILE
,--input_file INPUT_FILE
: Path to a CSV or TXT file containing a list of catchment IDs, lat/lon pairs, or gage IDs; or a single catchment ID (e.g.,cat-5173
), a single lat/lon pair, or a single gage ID.-l
,--latlon
: Use latitude and longitude instead of catchment IDs. When used with-i
, the file should contain lat/lon pairs.-g
,--gage
: Use gage IDs instead of catchment IDs. When used with-i
, the file should contain gage IDs.-s
,--subset
: Subset the hydrofabric to the given catchment IDs, locations, or gage IDs.-f
,--forcings
: Generate forcings for the given catchment IDs, locations, or gage IDs.-r
,--realization
: Create a realization for the given catchment IDs, locations, or gage IDs.--start_date START_DATE
: Start date for forcings/realization (format YYYY-MM-DD).--end_date END_DATE
: End date for forcings/realization (format YYYY-MM-DD).-o OUTPUT_NAME
,--output_name OUTPUT_NAME
: Name of the subset to be created (default is the first catchment ID in the input file).
-l
, -g
, -s
, -f
, -r
can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use -sfr
or -s -f -r
.
-
Subset hydrofabric using catchment IDs:
python -m ngiab_data_cli -i catchment_ids.txt -s
-
Generate forcings using a single catchment ID:
python -m ngiab_data_cli -i cat-5173 -f --start_date 2023-01-01 --end_date 2023-12-31
-
Create realization using lat/lon pairs from a CSV file:
python -m ngiab_data_cli -i locations.csv -l -r --start_date 2023-01-01 --end_date 2023-12-31 -o custom_output
-
Perform all operations using a single lat/lon pair:
python -m ngiab_data_cli -i 54.33,-69.4 -l -s -f -r --start_date 2023-01-01 --end_date 2023-12-31
-
Subset hydrofabric using gage IDs from a CSV file:
python -m ngiab_data_cli -i gage_ids.csv -g -s
-
Generate forcings using a single gage ID:
python -m ngiab_data_cli -i 01646500 -g -f --start_date 2023-01-01 --end_date 2023-12-31
- CSV file: A single column of catchment IDs, or a column named 'cat_id', 'catchment_id', or 'divide_id'.
- TXT file: One catchment ID per line.
Example CSV (catchment_ids.csv):
cat_id,soil_type
cat-5173,some
cat-5174,data
cat-5175,here
Or:
cat-5173
cat-5174
cat-5175
- CSV file: Two columns named 'lat' and 'lon', or two unnamed columns in that order.
- Single pair: Comma-separated values passed directly to the
-i
argument.
Example CSV (locations.csv):
lat,lon
54.33,-69.4
55.12,-68.9
53.98,-70.1
Or:
54.33,-69.4
55.12,-68.9
53.98,-70.1
- CSV file: A single column of gage IDs, or a column named 'gage' or 'gage_id'.
- TXT file: One gage ID per line.
- Single gage ID: Passed directly to the
-i
argument.
Example CSV (gage_ids.csv):
gage_id,station_name
01646500,Potomac River
01638500,Shenandoah River
01578310,Susquehanna River
Or:
01646500
01638500
01578310
The script creates an output folder named after the first catchment ID in the input file, the provided output name, or derived from the first lat/lon pair or gage ID. This folder will contain the results of the subsetting, forcings generation, and realization creation operations.