spaceml-org / ml4floods

An ecosystem of data, models and code pipelines to tackle flooding with ML

Home Page:https://spaceml-org.github.io/ml4floods/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[DataPrep] Create appropriate file structure for ML

jejjohnson opened this issue · comments

Recreate function files_train_test.py which will take a list of files and do the directory partition into train, test, val.

  • Need root folder with S2 images and gt
  • Divy them into train, test, val.
  • Copy files.

Question @gonzmg88 , should the tiling already be done?

Library that can help for Task 2: https://pypi.org/project/split-folders/

  • Divy them into train, test, val.

Hi

Just to double check, in this problem it is SUPERIMPORTANT that train/test/val is not done randomly because some flood maps overlap others and/or are from the same flood event. Also, the quality of the images highly varies (some floodmaps are very well labeled others have a lot or errors). We built the test set in the following manner:

  • We manually chose the images that have few errors and few clouds and that were originally labeled from Sentinel-2 by the people at Copernicus-EMS or UNOSAT (11 floodmaps in total).
  • We removed from the training set all floodmaps that overlap locations in the test set and all the floodmaps from the same flood event as those in the test set.

This is very important to be able to proof that the model generalises across different geographies.

This is a good point. So for the safe goal (and to reproduce the results), we will just have the proper list (List[str]) of training/test/val names and pass them to the dataset/dataloader to do the extra plumbing to make them ML ready.