Code base for the course: Self-Supervised Learning for Earth Observation
Here are the logs for running the model with default settings for 50 and 100 epochs: wandb results pretraining
Conda or Mamba (preferred) is required for the setup. We assume that you have a NVIDIA GPU available.
- create the python env:
mamba env create -f env.yml
- activate the env:
mamba activate ssl4eo
- download the MMEarth data (~45 GB):
curl -L https://sid.erda.dk/share_redirect/fnCZOGsWDC -o data_100k_v001.zip
- Make a directory for data:
mkdir <your path>
- unzip the folder:
unzip data_100k_v001.zip -d <your path>
- set the env variable to your MMEarth directory:
mamba env config vars set -n ssl4eo MMEARTH_DIR=<your path>
- reload environment to ensure that env variable is set:
mamba activate ssl4eo
- to download geobench data, run:
geobench-download
- (optional) get pretrained weights:
curl -L https://sid.erda.dk/share_redirect/DGCdXRPvNg -o weights.zip
- (optional) unzip somewhere:
unzip weights.zip -d <path to somewhere>
- (optional) run the tests (takes some time):
pytest
Follow the SSL4EO Mini-Projects instructions - Compute access on DEIC to get started with DeiC. Once you have access to DeiC - course resources and started a container and run the following. It will install and prepare your conda env in ~10-25 mins:
The default setting for the pretraining is that all data is used and "biome" is used as the target for the online classifier. Also, if not specified, all methods are used.
Get an overview of commands:
python main.py --help
Pretraining with VICReg:
python main.py --methods vicreg
Evaluating on bigeartnet at the end of the pretraining with SimClr:
python main.py --methods simclr --geobench-datasets=m-bigearthnet
Evaluating on bigeartnet with pretrained barlowtwins model:
python main.py --methods barlowtwins --geobench-datasets=m-bigearthnet --epochs=0 --ckpt-path=/work/data/weights/barlowtwins/50epochs.ckpt
When changing the main dataset, you will need to recreate the optimized dataformat.
Therefore specify your processed folder to be a writeable directory. Here for an example when pretraining with "eco_region" (instead of biome) as online linear probing target (all methods):
python main.py --target=eco_region --processed_dir=/work/project
Another example that needs newly processed data, we only use 10% training data for bigearthnet:
python main.py --methods barlowtwins --processed_dir=/work/project --geobench-datasets=m-bigearthnet --geobench-partition