Cal-Val-Example

This repository is meant to demonstrate how to organize validation data and activities on AWS and github.

Organization

Creation of Cal/Val Datasets in the schematic
- geojson with relevant metadata and urls to measurement/images
- storing data on s3
Processing of datasets and validation activities

The generation of the Confusion matrix is meant to represent the final validation of requirements.

Installation

Anaconda python and then install mamba. Set up the environment.

mamba env create -f environment.yml

Then, conda activate calval_env and install the kernel for jupyter with python -m ipykernel install --user --name calval_env.

Basic Mock Up

We are going to do a very basic water/no-water analysis over 3 deltas along the Gulf-coast comparing them to similar maps made from LANDSAT (thanks Matt Hansen!)

Below, the diagram is roughly the schematic of what we will be demonstrating in this repo.

Technically, the product data should be available from the OPERA Product API. However, this is an exercise to provide the team a clear example of how to interact with the data.

Steps

Important note: if running this on a local workstation with non-enterprise connectivity, the notebooks download/upload time will be slow (e.g. files in this example are upwards of 5 GB; on my home internet, uploading to an s3 bucket took about 1.25 hours). It is highly recommended to run these examples on a machine with high connectivity to ensure downloads/uploads are highly efficient.

Create datasets/aois with QGIS and its vector drawing tools to cover the areas above. See the resulting geojsons here.
Create an s3 bucket called calval-metadata in s3. Do not modify any default settings during bucket creation (i.e. use default settings across the board).
Generate AWS credentials (credentials last for only a few hours) with these instructions; this will require JPL VPN to access the repository and to generate the proper credentials. You will clone and presumably have most of the libraries (may have to install). I just ran python aws-python.py without changing the file permissions and that was fine. If you are apart of more than one AWS account, you will be prompted to select the proper role.

Update your ~/.netrc to include earthdata login credentials e.g.

machine urs.earthdata.nasa.gov
 login <username>
 password <password>

Make sure you are able to sign in https://search.asf.alaska.edu/ - you should have previously accepted a license agreement.
Organize the landsat mosaics as here.
We have to crop the mosaics to perform classification - they are distributed in squares with width 40,000 pixel. Lots of GIS and re-uploading is done in the subsequent notebook.
We demonstrate downloading from Cal/Val database as here. The data can then be viewed in QGIS, for example. Or just to double check the metadata or measurements are correct. Here are a metadata geojsons: mosaics and cropped mosaics.
Create "training data" geosjon (as in step 1.) using "land" and "water" labels in QGIS. See the resulting example here.
Create Reference Map using (a) cropped landsat mosaic, (b) training data, and (c) random Forest in this notebook.
Get intersecting ALOS-products, generate a water mask from a threshold, and compare the data to our reference maps. Gets GIS intensive as indicated in this notebook.

kvenkman / cal-val-example

Cal-Val-Example

Organization

Installation

Basic Mock Up

Steps

About

Languages