Contributors: Sykas D., Zografakis D., Sdraka M.
This repository contains the models and training scripts for reproducing the experiments presented in:
A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning.
Based on the Sen4AgriNet dataset, we produce two distinct sub-datasets of Sentinel-2 L1C images for experimentation:
- Patches Assembled Dataset (PAD): all Sentinel-2 images expanding over 2 years (2019, 2020) and 2 regions (France, Catalonia) with pixel-wise labels for crop classification (overall 168 classes).
- Object Aggregated Dataset (OAD): on top of PAD, the mean and std of each parcel is computed for each Sentinel-2 observation and used as input data. A single label for each parcel is used.
The above sub-datasets were downsized in order to be used for the experiments in this repository. Specifically, 5000 patches with 60-20-20 (train-val-test) split were sampled from each sub-dataset and the 11 most frequent classes were kept (wheat, maize, sorghum, barley, rye, oats, grapes, rapeseed, sunflower, potatoes, peas). Three different scenarios were explored:
Scenario | Train | Test |
---|---|---|
1 | Catalonia (2019, 2020), France (2019) | Catalonia (2019, 2020), France (2019) |
2 | Catalonia (2019, 2020) | France (2019) |
3 | France (2019) | Catalonia (2020) |
The input of the PAD models is the median of each month of observations from April through September. The OAD models take as input the aggregated statistics of these observations.
For a full reproduction of the experiments presented in the associated publication, please download the netCDF files from one of the following sources: Dropbox, Google Drive or HuggingFace Hub (experimental). In this Dropbox folder you can also find the COCO files and the weights of the trained models.
This repository was tested on:
- Python 3.8
- CUDA 11.4
- PyTorch 1.11
- PyTorch Lightning 1.6
Check requirements.txt
for other essential modules.
For PAD:
For OAD:
S4A-models\
L dataset\
L netcdf\
L oad\
L coco_files\
L logs\
L medians
L model\
L utils\
L settings\
L mappings\
In the coco_files/
folder are the COCO files required for training, validating and testing the models.
In the dataset/netcdf/
folder you should place the downloaded netCDF4 files.
In the dataset/oad/
folder you should place the exported files containing the OAD statistics.
- The main configuration file is
utils/settings/config.py
. - Custom Taxonomy Mapping is located at
utils/settings/mappings/mappings_{cat, fr}.py
.
Every script inherits settings from the aforementioned files.
coco_data_split.py
: Uses the nectCDF4 data to produce three COCO files for training, validation and testing.export_medians_multi.py
: Uses the netCDF4 data and the COCO files to compute the median image per month and export them to the disk.compute_class_weights.py
: Computes the class weights based on the exported medians, to account for class imbalance.object-based-csv.py
: Uses the netCDF4 data to compute the statistics required for OAD.pad_experiments.py
: The main script for training/testing the PAD models.oad_experiments.py
: The main script for training/testing the OAD models.visualize_predictions.py
: Produces a visualization of the ground truth and the prediction of a given model for a given image. Only relevant for PAD models.
Preparation
- (Optional) Run
export_medians_multi.py
to precompute the medians needed for training, validation and testing. - If you don't want to use the given COCO files, then export your own using the
coco_data_split.py
script. - Uncomment the precomputed class weights in the corresponding section of the configuration file depending on the scenario you are using (or compute your own).
- Especially for OAD, run
object-based-csv.py
to export the statistics needed for the experiments.
For PAD:
- Run
pad_experiments.py
with the appropriate arguments. Example:The above command is for training the ConvLSTM model using the weighted parcel loss described in the associated publication. Training will continue for 10 epochs with batch size 32, using the Sentinel-2 bands Blue (B02), Green (B03), Red (B04) and NIR (B08). The input image size is 61x61, the precomputed medians are used to speed up training and all input data are normalized. Finally, a fixed window is used containing months 4 (April) through 9 (September). Please use thepython pad_experiments.py --train --model convlstm --parcel_loss --weighted_loss --root_path_coco <coco_folder_path> --prefix_coco <coco_file_prefix> --netcdf_path <netcdf_folder_path> --prefix <run_prefix> --num_epochs 10 --batch_size 32 --bands B02 B03 B04 B08 --saved_medians --img_size 61 61 --requires_norm --num_workers 16 --num_gpus 1 --fixed_window
--help
argument to find information on all available parameters. - Optionally, after training run
visualize_predictions.py
to visualize the image, ground truth and prediction for a specific model and image.
For OAD:
- Run
oad_experiments.py
with the appropriate arguments. Example:The above command is for training the Transformer model. Training will continue for 10 epochs with batch size 32, using given file containing the OAD statistics. The hidden size is 1024 and three layers are used for the model. Please use thepython oad_experiments.py --train --model transformer --prefix <run_prefix> --file <oad_file_name> --num_epochs 10 --batch_size 32 --num_workers 16 --num_gpus 1 --hidden_size 1024 --num_layers 3
--help
argument to find information on all available parameters.
The results reported on the given COCO files are presented in the following tables.
Scenario | Model | Acc. W. (%) | F1 W. (%) | Precision W. (%) |
---|---|---|---|---|
1 | U-Net | 93.70 | 82.61 | 86.64 |
1 | ConvLSTM | 94.72 | 85.18 | 86.86 |
1 | ConvSTAR | 92.78 | 80.38 | 83.33 |
2 | U-Net | 83.12 | 57.85 | 61.57 |
2 | ConvLSTM | 82.53 | 56.56 | 60.57 |
2 | ConvSTAR | 79.52 | 52.15 | 58.98 |
3 | U-Net | 72.11 | 43.54 | 68.42 |
3 | ConvLSTM | 69.86 | 40.47 | 66.17 |
3 | ConvSTAR | 69.07 | 34.45 | 67.43 |
Scenario | Model | Acc. W. (%) | F1 W. (%) | Precision W. (%) |
---|---|---|---|---|
1 | LSTM | 88.52 | 88.03 | 87.85 |
1 | Transformer | 88.36 | 88.10 | 87.90 |
1 | TempCNN | 90.08 | 89.97 | 90.01 |
2 | LSTM | 91.55 | 91.34 | 91.31 |
2 | Transformer | 39.17 | 31.45 | 58.52 |
2 | TempCNN | 36.90 | 30.14 | 60.71 |
3 | LSTM | 60.60 | 63.96 | 70.55 |
3 | Transformer | 51.21 | 56.71 | 67.76 |
3 | TempCNN | 52.32 | 57.38 | 68.35 |
If you use our work, please cite:
@ARTICLE{
9749916,
author={Sykas, Dimitrios and Sdraka, Maria and Zografakis, Dimitrios and Papoutsis, Ioannis},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
title={A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning},
year={2022},
doi={10.1109/JSTARS.2022.3164771}
}