spaceml-org / ml4floods

The full preprocessing pipeline for WorldFloods1.1 and WorldFloods 2.0:

Query Copernicus EMS
Generate Floodmaps
Query GEE with FloodMaps
Generate GT with floodmaps and S2 images.

Visual Pipeline

Current Contributors

@nadia-eecs

@gonzmg88

Demo

All of the steps below are based on the demo notebook found here:

HOWTO_download_copernicus_ems_data.ipynb.

no .cog file format considerations (that we know of...)...

Copernicus Query & Save (`ingest.py`)

Download the Zip Files from Copernicus EMS
Unzip the files into appropriate file directory structure

Copernicus Post-Processing (`hardutils.py`)

These are useful and necessary steps to acquire floodmaps. These can be used for visualization purposes OR for the MLOPs.

Search through unzipped files to get shape files (3 shape files)
- Area of Interest
- Observed event
- Hydrography (river); sub categories l/a
Build the Copernicus Meta with Filenames of the items

Build FloodMap (`softutils.py`)

query copernicus meta data to get the names and shape files
open with geopandas
collapse polygons with labels (e.g. flood, hydro..., )
convert new shape file to geojson
store geojson to ml4floods_data_lake_ETL bucket
store the floodmap meta data that was queried...? @gonzmg88

Sentinel-2 (S2) (`ingest.py`)

We use the geojson files to get a bounding box to query GEE for S2 images

query database (e.g., data, event, alert) for stored geojson files.
using the polygons, query GEE platform
download S2 tiles that intersect to a bucket
save as .cog
- Pipe to Viz Mart @Lkruitwagen

Note: ee_download.py

Build Ground Truth (`softutils.py`)

Cloud or No Cloud

Query S2 database for images in ROI
Last band from S2 image
save tiff files to bucket, data_lake_mlmart

Water or No Water

Query S2 database for images in ROI
Magic...... See: create_gt.py
save tiff files to bucket, data_lake_mlmart

Visualization Territory

Query GEE given floodmaps
Save them to .cog format, data_lake_vizmart bucket

@Lkruitwagen Any opinion about any intermediate steps?

Divided up the notebook here. This goes over the pipeline but does so with everything saved locally and needs to be converted to save to the bucket instead. @jejjohnson @satyarth934

The export image from the GEE already saves the S2 image as a COG GeoTIFF:

ml4floods/src/data/ee_download.py

Line 232 in 17b4552

formatOptions={"cloudOptimized": True},

This part of the code has the "smart" handling of the cloud or no cloud:

ml4floods/src/data/create_gt.py

Line 173 in 17b4552

if cloudprob_in_lastband:

Actually I would save that tutorial as-is (i.e. saving locally for every step in the pipeline following the query to copernicus EMS like you already did @nadia-eecs ) for the demos that we show people. If you’re doing a notebook demo like that as an outside user, they probably won’t have save access to the buckets. So that’s a nice tutorial for showing people how the pipeline works.

For the scripts and generating all of the data, we can change it to saving to bucket. And then for MLOPs, we can show them how to access the already saved stuff in the bucket for every point in the pipeline, e.g. floodmaps, s2 images, and ground truth(s), as that’s the only parts they’ll probably care about.

I agree, I'd say notebooks 1-4 that @nadia-eecs made are nice for internal use (to figure out how the ingestion pipeline works) and the previous notebook is nice as a tutorial for external users. In that tutorial we can even change the export part to export the S2 image to the user's Google Drive.

However I'd say prioritize the ingestion pipeline and put in the backlog the tutorial notebook!

[DataPrep] WorldFloods 1.1 and WorldFloods 2.0 Full Pipeline

Visual Pipeline

Current Contributors

Demo

Copernicus Query & Save (ingest.py)

Copernicus Post-Processing (hardutils.py)

Build FloodMap (softutils.py)

Sentinel-2 (S2) (ingest.py)

Build Ground Truth (softutils.py)