This repository is meant to demonstrate how to organize validation data and activities on AWS and github.
- Creation of Cal/Val Datasets in the schematic
- geojson with relevant metadata and urls to measurement/images
- storing data on s3
- Processing of datasets and validation activities
The generation of the Confusion matrix is meant to represent the final validation of requirements.
Anaconda python and then install mamba. Set up the environment.
mamba env create -f environment.yml
Then, conda activate calval_env
and install the kernel for jupyter with python -m ipykernel install --user --name calval_env
.
We are going to do a very basic water/no-water analysis over 3 deltas along the Gulf-coast comparing them to similar maps made from LANDSAT (thanks Matt Hansen!)
Below, the diagram is roughly the schematic of what we will be demonstrating in this repo.
Technically, the product data should be available from the OPERA Product API. However, this is an exercise to provide the team a clear example of how to interact with the data.
Important note: if running this on a local workstation with non-enterprise connectivity, the notebooks download/upload time will be slow (e.g. files in this example are upwards of 5 GB; on my home internet, uploading to an s3 bucket took about 1.25 hours). It is highly recommended to run these examples on a machine with high connectivity to ensure downloads/uploads are highly efficient.
- Create
datasets/aois
withQGIS
and its vector drawing tools to cover the areas above. See the resulting geojsons here. - Create an s3 bucket called
calval-metadata
in s3. Do not modify any default settings during bucket creation (i.e. use default settings across the board). - Generate AWS credentials (credentials last for only a few hours) with these instructions; this will require JPL VPN to access the repository and to generate the proper credentials. You will clone and presumably have most of the libraries (may have to install). I just ran
python aws-python.py
without changing the file permissions and that was fine. If you are apart of more than one AWS account, you will be prompted to select the proper role. - Update your
~/.netrc
to include earthdata login credentials e.g.machine urs.earthdata.nasa.gov login <username> password <password>
- Make sure you are able to sign in
https://search.asf.alaska.edu/
- you should have previously accepted a license agreement. - Organize the landsat mosaics as here.
- We have to crop the mosaics to perform classification - they are distributed in squares with width 40,000 pixel. Lots of GIS and re-uploading is done in the subsequent notebook.
- We demonstrate downloading from Cal/Val database as here. The data can then be viewed in QGIS, for example. Or just to double check the metadata or measurements are correct. Here are a metadata geojsons: mosaics and cropped mosaics.
- Create "training data"
geosjon
(as in step 1.) using "land" and "water" labels inQGIS
. See the resulting example here. - Create Reference Map using (a) cropped landsat mosaic, (b) training data, and (c) random Forest in this notebook.
- Get intersecting ALOS-products, generate a water mask from a threshold, and compare the data to our reference maps. Gets GIS intensive as indicated in this notebook.