[DataPrep] Create appropriate file structure for ML

Question

[DataPrep] Create appropriate file structure for ML

jejjohnson opened this issue 3 years ago · comments

Juan Emmanuel Johnson commented 3 years ago

Recreate function files_train_test.py which will take a list of files and do the directory partition into train, test, val.

Need root folder with S2 images and gt
Divy them into train, test, val.
Copy files.

Question @gonzmg88 , should the tiling already be done?

Satyarth Praveen · Answer 1 · Tue Feb 16 2021 22:21:17 GMT+0800 (China Standard Time)

Library that can help for Task 2: https://pypi.org/project/split-folders/

Divy them into train, test, val.

Gonzalo Mateo García · Answer 2 · Tue Feb 16 2021 22:33:16 GMT+0800 (China Standard Time)

Hi

Just to double check, in this problem it is SUPERIMPORTANT that train/test/val is not done randomly because some flood maps overlap others and/or are from the same flood event. Also, the quality of the images highly varies (some floodmaps are very well labeled others have a lot or errors). We built the test set in the following manner:

We manually chose the images that have few errors and few clouds and that were originally labeled from Sentinel-2 by the people at Copernicus-EMS or UNOSAT (11 floodmaps in total).
We removed from the training set all floodmaps that overlap locations in the test set and all the floodmaps from the same flood event as those in the test set.

This is very important to be able to proof that the model generalises across different geographies.

Juan Emmanuel Johnson · Answer 3 · Tue Feb 16 2021 22:42:39 GMT+0800 (China Standard Time)

This is a good point. So for the safe goal (and to reproduce the results), we will just have the proper list (List[str]) of training/test/val names and pass them to the dataset/dataloader to do the extra plumbing to make them ML ready.