kostistsaprailis / Callisto-Dataset-Collection

A list of datasets aiming to enable Artificial Intelligence applications that use Copernicus data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AI for Copernicus - a data repository by CALLISTO

A list of datasets aiming to enable Artificial Intelligence applications that use Copernicus data.

Callisto Generated Datasets

  • Annotated Street Level Images from Mapillary (published in MMM22)
    Crop type labels from the freely available Land Parcel Identification System (LPIS) of the Netherlands are matched with all available Mapillary street-level images for the year 2017.
    Mapillary Annotated - Dataset sample

    Data Source Type Area Task Paper Code Relevant implementations
    Street level images Image Netherlands Crop Classification (2022) GitHub Street2Sat, DenseASPP, Crop Phenology, Scene Segmentation
  • Paddy Rice Maps South Korea (2017~2021)
    This dataset includes paddy rice maps in South Korea from 2017 to 2021 with 10m resolution. The paddy rice maps are a product of deep learning model predictions and DO NOT represent ground truth information. The predictions were made by analyzing time series Sentinel-1 images based on the deep learning architecture that integrates U-Net and RNNs layers desined by eGIS/RS lab, Korea University. The deep learning model has been trained on the farm map produced by the Korean Ministry of Agriculture, Food and Rural Affairs(MAFRA). The validation accuracy and Cohen's kappa value are 96.50%, 0.7857 each which were calculated from the 40% of the farm map. For more information please contact to the KU-eGIS/RS lab.
    Paddy Rice mapping (binary) with DL

    Data Source Type Area Task Paper Code
    Sentinel 2 GeoTIFF South Korea Paddy Rice Mapping - GitHub
  • Paddy Rice Labeling Sites in South Korea (2018)
    The paddy rice was visually interpreted at 30 sites in South Korea. The sites were selected at each province by a proportional stratified sampling method according to the paddy rice area statistics (Statistics Korea), so the dataset can be used for the validation on model generalization over the entire country. The paddy rice areas were visually interpreted by using Google Earth Pro and street view services (https://map.naver.com, https://map.kakao.com) and updated to the state of 2018.
    Paddy Rice Labelling Sites (Visual Interpretation)

    Data Source Type Area Task Paper Code
    Sentinel 2 GeoTIFF South Korea Paddy Rice Validation - -

Existing Datasets

Agriculture

Analysis Ready Remote Sensing Data with labels

  • CropHarvest: a global satellite dataset for crop type classification
    The CropHarvest dataset is a crop dataset of geo-referenced labels with satellite data inputs, each consisting of latitude, longitude, the associated agricultural label, and a satellite pixel time series. It contains 90,480 datapoints from 20 datasets; some datasets come from existing public sources while some (e.g., Rwanda) are being made public with this publication. The datasets include 3 different types of labels: i) binary labels (crop/non crop) ii) FAO’s indicative crop classification labels, whcih resulted to 9 crop type groupings: cereals, vegetables and melons, fruits and nuts, oilseed crops, root/tuber crops, beverage and spice crops, leguminous crops, sugar crops, and other crops iii) crop-type labels, if available.
    These labels are also accomompanied by Remote sensing data. More specifically, for each point/polygon in the dataset there is also 12-timestep signature of:

    • Sentinel-2 monthly aggregated values (all bands except B1 and B10 + NDVI)

    • Sentinel-1 monthly aggregated values (VV and VH)

    • Meteorological monthly aggragated data (total precipitation and ground temperature at 2 m height from the ERA5 dataset with a spatial analysis of 31 km/px)

    • Topographic Data from the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) with 30m/px Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) analysis.

      Data Source Type Area Task Paper Code
      Sentinel 1-2/ERA5/DEM Pixel Global Crop Classification (2021) GitHub
  • BigEarthNet dataset

    • BigEarthNet is a benchmark archive, consisting of 590,326 pairs of Sentinel-1 and Sentinel-2 image patches.

    • To construct BigEarthNet with Sentinel-2 image patches (called as BigEarthNet-S2 now, previously BigEarthNet), 125 Sentinel-2 tiles acquired between June 2017 and May 2018 over the 10 countries (Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Luxembourg, Portugal, Serbia, Switzerland) of Europe were initially selected. All the tiles were atmospherically corrected by the Sentinel-2 Level 2A product generation and formatting tool (sen2cor). Then, they were divided into 590,326 non-overlapping image patches. Each image patch was annotated by the multiple land-cover classes (i.e., multi-labels) that were provided from the CORINE Land Cover database of the year 2018 (CLC 2018). The labels in BigEarthNet belong to the initial release of the labels in 2018.

    • To construct BigEarthNet with Sentinel-1 image patches (called as BigEarthNet-S1), 321 Sentinel-1 scenes acquired between June 2017 and May 2018 that jointly cover the area of all original 125 Sentinel-2 tiles with close temporal proximity were selected and processed. BigEarthNet-S1 consists of 590,326 preprocessed Sentinel-1 image patches - one for each Sentinel-2 patch. A more detailed explanation on the processing is given in its dataset description document.

      Data Source Type Area Task Paper Code Relevant Datasets
      Sentinel 1/2 Patch Europe Land Cover Classification (2019) (2021) GitHub Belgium LPIS/GSAA Luxembours LPIS
  • EuroSAT dataset
    27000 labeled and geo-referenced Sentinel 2 satellite image patches (i.e., 64 64 pixels). Although the classification scheme is made up of 10 different classes, including land covers having peculiar temporal patterns (i.e., annual crops, permanent crops), the dataset is based on single time images.

    Data Source Type Area Task Paper Code Relevant implementations
    Sentinel 2 Patch Europe Land Cover Classification (2018) (2019) GitHub EfficientNet EfficientNetV2 Vision Transformers
  • Sen12MS
    The SEN12MS dataset contains 180,662 patch triplets of corresponding Sentinel-1 dual-pol SAR data, Sentinel-2 multi-spectral images, and MODIS-derived land cover maps. The patches are distributed across the land masses of the Earth and spread over all four meteorological seasons. This is reflected by the dataset structure. All patches are provided in the form of 16-bit GeoTiffs containing the following specific information:

    • Sentinel-1 SAR: 2 channels corresponding to sigma nought backscatter values in dB scale for VV and VH polarization.

    • Sentinel-2 Multi-Spectral: 13 channels corresponding to the 13 spectral bands (B1, B2, B3, B4, B5, B6, B7, B8, B8a, B9, B10, B11, B12).

    • MODIS Land Cover: 4 channels corresponding to IGBP, LCCS Land Cover, LCCS Land Use, and LCCS Surface Hydrology layers.

      Data Source Type Area Task Paper Code Relevant Implementations
      Sentinel 1/2 Patch Global Land Cover Classification (2019) (2021) GitHub Image Classification: EfficientNet Transformer Vision Transformers
      Semantic Segmentation: U-Net DeepLab Transformer
  • SAT-4 and SAT-6
    SAT-4: Originally, images were extracted from the National Agriculture Imagery Program (NAIP) dataset. The SAT-4 contains 500,000 RGB images. Each sample image is 28x28 pixels (1m spatial resolution) and consists of 4 bands - red, green, blue and near infrared. Each image is annotated with one of the four classes that represent four broad land covers which include barren land, trees, grassland and a class that consists of all land cover classes other than the above three.
    SAT-6: Originally, images were extracted from the National Agriculture Imagery Program (NAIP) dataset. The SAT-6 contains 405,000 RGB images. Each sample image is 28x28 pixels (1m spatial resolution) and consists of 4 bands - red, green, blue and near infrared. Each image is annotated with one of the six classes that represent six broad land covers which include barren land, trees, grassland, roads, buildings and water bodies.

    This dataset could potentially be used for Super-Resolution tasks. For example, by matching this dataset with corresponing Sentinel-2 images. In the table below, we propose indicatively a list of implementations for this task on the PROBA-V dataset available on the paperwithcode website.

    Data Source Type Area Task Paper Code Relevant Implementations
    Aerial (R,G,B,NIR) Patch California Land Cover Classification (2015) - Super-Resolution
  • ZueriCrop
    The ZueriCrop dataset contains ground truth labels of 116,000 field instances. Each field instance consists of a polygon representing the borders of the field, and its dominant crop label in 2019. The ground truth labels of all 48 crop classes are provided by the Swiss Federal Office for Agriculture (FOAG) and correspond to the primary crop grown per field during the year. The input data is a time series of 71 multi-spectral Sentinel-2 Level-2A bottom-of-atmosphere reflectance images with a ground sampling distance (GSD) of 10 meters. All input images are atmospherically corrected using the Sen2Cor v2.8 software package. The dataset is collected over a 50 km × 48 km area in the Swiss Cantons of Zurich and Thurgau between January 2019 and December 2019. The entire scene is subdivided into smaller patches of 24 px×24 px. Patches without any ground-truth information are discarded. In the remaining patches the fraction of pixels without reference label is ≈48%. Only those four spectral channels available at the highest, 10 m resolution (Red, Green, Blue, and Near-Infrared) are used.

    Data Source Type Area Task Paper Code Relevant Implementations
    Sentinel 2 Patch Zurich (Switzerland) Crop Classification (2021) GitHub U-TAE
  • PASTIS
    PASTIS is a benchmark dataset for panoptic and semantic segmentation of agricultural parcels from satellite time series. It contains 2,433 patches within the French metropolitan territory with panoptic annotations (instance index + semantic label for each pixel). Each patch is a Sentinel-2 multispectral image time series of variable length.
    PASTIS dataset has been extended from the initial publication with aligned radar Sentinel-1 observations for all 2,433 patches in addition to the Sentinel-2 images. For each patch, approximately 70 observations of Sentinel-1 have been added in ascending orbit, and 70 observations in descending orbit. PASTIS-R can be used to evaluate optical-radar fusion methods for parcel-based classification, semantic segmentation, and panoptic segmentation.

    Data Source Type Area Task Paper Code
    Sentinel 2 Pixel France Semantic and Panoptic Crop Segmentation (2021) 2022 GitHub
  • CV4A Kenya
    This dataset was produced as part of the Crop Type Detection competition at the Computer Vision for Agriculture (CV4A) Workshop at the ICLR 2020 conference. The ground reference data were collected by the PlantVillage team, and Radiant Earth Foundation curated the training dataset after inspecting and selecting more than 4,000 fields from the original ground reference data. The dataset has been split into training and test sets (3,286 in the train and 1,402 in the test). The dataset is cataloged in four tiles. These tiles are smaller than the original Sentinel-2 tile that has been clipped and chipped to the geographical area that labels have been collected. Each tile has a) 13 multi-band observations throughout the growing season. Each observation includes 12 bands from Sentinel-2 L2A product, and a cloud probability layer. The twelve bands are [B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12]. The cloud probability layer is a product of the Sentinel-2 atmospheric correction algorithm (Sen2Cor) and provides an estimated cloud probability (0-100%) per pixel. All of the bands are mapped to a common 10 m spatial resolution grid.; b) A raster layer indicating the crop ID for the fields in the training set; and c) A raster layer indicating field IDs for the fields (both training and test sets). Fields with a crop ID of 0 are the test fields.

    Data Source Type Area Task Paper Code
    Sentinel 2 Sentinel tiles (Images) Kenya Crop Classification (2020) GitHub
  • TimeSen2Crop
    A pixel based dataset made up of more than 1 million samples of Sentinel 2 Time Series (TSs) associated to 16 crop types. This dataset includes atmospherically corrected images and reports the snow, shadows and clouds information per labeled unit. The provided TSs represent an agronomic year ranging from September 2017 to August 2018, using the publicly available Austrian crop type map based on farmer's declarations. TimeSen2Crop also includes a TS of Sentinel 2 images acquired in the following agronomic year (i.e., from September 2018 to August 2019).

    Data Source Type Area Task Paper Code
    Sentinel 2 Pixel Austria Crop Classification (2020) -
  • Sen4AgriNet
    The Sen4AgriNet dataset is built using Sentinel-2 images from different timestamps include all spectral bands that have different spatial resolution. On top of the dataset, it has been developed a series of functions such as spatio-temporal aggregations, to transform the original dataset according to the different AI problems.

    • 5-year multitemporal Sentinel-2 patches

    • Sentinel-1/2 data

    • The initial version of Sen4AgriNet consists of approximately 225,000. Corregistered with open LPIS data for regions in Spain and France with a total size of 10TB

      Data Source Type Area Task Paper Code
      Sentinel 2 Patch Europe Crop Classification (2021) GitHub
  • BreizhCrops
    BreizhCrops is a novel benchmark dataset for the supervised classification of field crops from satellite time series. It contains aggregated label data and Sentinel-2 top-of-atmosphere as well as bottom-of-atmosphere time series in the region of Brittany (Breizh in local language), north-east France.

    Data Source Type Area Task Paper Code
    Sentinel 2 Object Brittany (France) Crop Classification (2020) GitHub
  • Crop Type Mapping - Semantic Segmentation Datasets in Ghana and South Sudan
    The datasets include time series of satellite imagery from Sentinel-1, Sentinel-2, and PlanetScope satellites throughout 2016 and 2017. For each tile/chip in the dataset, there are time series of imagery from each of the satellites, as well as a corresponding label that defines the crop type at each pixel. The label has only one value at each pixel location, and assumes that the crop type remains the same across the full time span of the satellite image time series. In many cases where ground truth was not available, pixels have no label and are set to a value of 0.

    Data Source Type Area Task Paper Code
    Sentinel 1/2 & Planetscope GeoTIFF Ghanna & South Sudan Crop Classification (2019) GitHub
  • CaneSat dataset
    This dataset contains 1627 multispectral high resolution image patches of size 10 x 10 pixels with each pixel size of 10mx10m. These patches are generated from the Sentinel-2 (A/B) satellite images acquired during the period of October 2018 to May 2019. It covered one life cycle (12 months) of the sugarcane crop in the region of the Karnataka, India. Along with sugarcane crop field areas, other land covers are also included for classification purpose. The dataset provides two formats: jpg and tif. Former format includes images with RGB channels and later format includes six bands namely, Red, Green, Blue, Near Infrared, Red Edge and Short-wave infrared. Dataset also provides 3 vegetation indices .tif images such as enhanced vegetation index (EVI), normalized difference vegetation index (NDVI) and green normalized difference vegetation index (GNDVI) separately. All tif image patches are georeferenced and labeled. The focus of this dataset is to support further research in sugarcane crop classification especially in India.

    Data Source Type Area Task Paper Code
    Sentinel 1/2 GeoTIFF, JPG Karnataka, India Sugarcane Classification (2020) -
  • Spot the Crop Challenge
    The dataset contains a time-series of satellite imagery and labels for crop type that have been collected through aerial and ground survey. Labels are derived from the survey conducted by the Western Cape Department of Agriculture, for the period of 04-01-2017 to 11-31-2017 and the area of Western Cape, South Africa. Satellite data including multispectral Sentinel-2 are then matched with corresponding labels. The S2 time-series is provided every 5 days. Sentinel-1 data include VV and VH backscatter with a time window of 12 days. The label chips contain the mapping of pixel to crop type label. The following pixel values correspond to the following crop types.

    • 0 - No Data
    • 1 - Lucerne/Medics
    • 2 - Planted pastures (perennial)
    • 3 - Fallow
    • 4 - Wine grapes
    • 5 - Weeds
    • 6 - Small grain grazing
    • 7 - Wheat
    • 8 - Canola
    • 9 - Rooibos
      Data Source Type Area Task Paper Code
      Sentinel 1/2 GeoTIFF South Africa Crop Classification - GitHub
  • DENETHOR dataset (password: dailycrops)
    DENETHOR: The DynamicEarthNET dataset for Harmonized, inter-Operabel, analysis-Ready, daily crop monitoring from space. Our dataset contains daily, analysis-ready Planet Fusion data together with Sentinel-1 radar and Sentinel-2 optical time-series for crop type classification in Northern Germany. The dataset includes: i) The Planet Fusion Monitoring product, which consists of clean (i.e. free from clouds and shadows), daily gap-filled, high resolution (3m), temporally consistent, radiometrically robust, harmonized and sensor agnostic surface reflectance time series, featuring and synergizing inputs from both public and private sensor sources and directly interoperable with HLS (harmonized Landsat Sentinel) surface reflectance products. ii) Sentinel-1 (S1) imagery, which contains 3 channels in total: [VV, VH, ANGLE] where V and H stand for vertical and horizontal orientations, respectively, and ANGLE stores the angle of observation to the earth surface as described here. The data is collected in Interferometric Wide (IW) swath mode and it includes both ascending and descending orbit directions. and iii) Sentinel-2 (S2) imagery, which includes all L2A bands in the following order [B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12]. The bands that have original spatial resolution of 20m and 60m are interpolated with a nearest-neighbour method to a 10m resolution.

    Data Source Type Area Task Paper Code
    Sentinel 1/2 & Planet Fusion Patch Northern Germany Crop Classification (2021) GitHub
  • Agriculture-Vision: Challenges & Opportunities for Computer Vision in Agriculture
    The dataset contains 21,061 aerial farmland images captured throughout 2019 across the US. Each image consists of four 512x512 color channels, which are RGB and Near Infra-red (NIR). Each image also has a boundary map and a mask. The boundary map indicates the region of the farmland, and the mask indicates valid pixels in the image. Regions outside of either the boundary map or the mask are not evaluated. This dataset contains six types of annotations: Cloud shadow, Double plant, Planter skip, Standing Water, Waterway and Weed cluster. These types of field anomalies have great impacts on the potential yield of farmlands, therefore it is extremely important to accurately locate them. In the Agriculture-Vision dataset, these six patterns are stored separately as binary masks due to potential overlaps between patterns. Users are free to decide how to use these annotations.

    Data Source Type Area Task Paper Code
    Aerial Images (RGB + NIR) USA Scene Classification (2020) GitHub
  • UAV-based Multispectral & Thermal dataset for exploring the diurnal variability, radiometric & geometric accuracy for precision agriculture
    To explore the diurnal variations, radiometric and geometric accuracy of UAV-based data for precision agriculture, a comprehensive dataset was created in a one-day field campaign (21 June 2017). The multi-sensor data set covers wheat, barley & potato experimental fields, located in Wageningen University and Research (WUR) farm maintained by Unifarm. UAV-based images were collected with several sensors over the experimental area, starting from 7:25am and ending at 20:00pm local solar time. The dataset consists of images collected by 9 flights with senseFly MSP4C, 9 with Parrot Sequoia, 2 with Slant Range P3, 5 with DJI Zenmuse X3 NIR, 4 with the senseFly Thermo-map and 1 with the RGB Sony WX-220. Additionally, validation measurements at radiometric calibration plates and plant sample locations were taken with a Cropscan handheld spectrometer and a tec5 Handyspec spectrometer. The dataset consists of the validation measurements, the raw images and the processed orthomosaics (both with and without geometric correction).

    Data Source Type Area Task Paper Code
    UAV Images (Green, Blue, Red, Red Edge, NIR, Thermal Infrared) Wageningen, Netherlands Crop Classification (2020) -

Analysis Ready Remote Sensing Data without labels

In-situ & Ground-level datasets

  • PlantVillage Dataset - Healthy and Unhealthy leaf images
    In this data-set, 39 different classes of plant leaf and background images are available. The data-set containing 61,486 images. The authors used six different augmentation techniques for increasing the data-set size. The techniques are image flipping, Gamma correction, noise injection, PCA color augmentation, rotation, and Scaling.

    Data Source Type Area Task Paper Code
    Crowdsource Grayscale/RGB Images USA Image Classification (healthy/unhealthy leaves) (2015) GitHub
  • iCrop Dataset - Street-level Imagery for Crop Classification
    It is the first large, public, multiclass road view crop photo dataset, for the development of crop type detection with deep learning.

    Data Source Type Area Task Paper Code
    Streel-level RGB Images China Crop Classification (2021) -
  • A Crop/Weed Field Image Dataset (CWFID) This dataset comprises field images, vegetation segmentation masks and crop/weed plant type annotations. The paper provides details, e.g. on the field setting, acquisition conditions, image and ground truth data format.

    Data Source Type Area Task Paper Code
    Field robot RGB Images Northern Germany Crop / Weed Discrimination (2015) Github

Geo-referenced labels

  • Hand Labelled Crop/No-Crop dataset
    This dataset provides the hand-labelled crop / non-crop points used for training, which were created by labelling high-resolution satellite imagery in QGIS and Google Earth Pro. Data is available for Ethiopia, Sudan, Togo and Kenya.

    Data Source Type Area Task Paper Code
    Photo-interpretation Shapefiles Africa Crop Discrimination (2021) Github
  • LEM+ dataset
    The dataset, in ESRI shapefile format (spatial reference system: WGS 84, EPSG: 4326), provides monthly land use information about 1854 fields from October 2019 to September 2020 from Luís Eduardo Magalhães (LEM) and other municipalities in the west of Bahia state, Brazil. The majority of the 16 land uses classes are related to crops. [Paper]

    Data Source Type Area Task Paper Code
    Field visits Shapefiles Brazil Crop Monitoring (2020) -
  • Land Cover Map (Korean Ministry of Environment)
    Korean Ministry of Environment provides three types of land cover map(level-1, level-2, level-3) according to its scale. Level-3 land cover map, the most detailed product, provides approximately 1m resolution by interpreting aerial photo(0.25m), Kompsat-2(1m) and Kompsat-3(0.7m) satellite images. It classifies 7 major land covers (Used area, Agricultural Area, Forest, Grassland, Wet land, Bareland, Water) and subdivides them into 41 classes. The level-3 product was produced at each province with several years of interval until 2018, and the most recent product was released at 2019 covering the entire nation with the imageries of 2018. The data is available only for the registered domestic researchers. Therefore, please ask for cooperation to the Korean researcher in order to use it for the research.

    • Level-1 product: 30m resolution, raster format
    • Level-2 product: 5m resolution, shape format
    • Level-3 product: 1m resolution, shape format
    Data Source Type Area Task Paper Code
    Farmer's Declarations Shapefiles South Korea Crop Monitoring - -
  • Open Labelled Data (Netherlands)
    The National Georegister focuses primarily on the professional user. This can be a Geo- ICT specialist looking for datasets, services or other geo-information elements. But also a policy officer who wants to consult a map, a web developer or a student who develops a website or application and is looking for geo-information for it.

    Data Source Type Area Task Paper Code
    Farmer's Declarations GeoDatabase The Netherlands Crop Monitoring - -
  • Open Labelled Data (Flanders, Belgium)
    Overview of the parcels in agricultural use on the final date of submission of the single application that year. The inventory also includes pools, wooded areas and agricultural production facilities (yards with stables and buildings).

    Data Source Type Area Task Paper Code
    Farmer's Declarations Shapefile, Gml (2.1.2) Flanders, Belgium Crop Monitoring - -
  • Open Labelled Data (Denmark)
    This data collection contatins a plethora of map data that the Danish Agriculture Authority has made openly avaialble. Specifically, under the Markblokke you can find the Land parcel Identification System (LPIS) data collection and under the Marker section you can find the Geo-spatial Aid Application (GSAA) data collection which contains parcel geometries accompanied by their crop type, from 2018 to today. More information are avaialble about the GSAA files where uou can also find you can find a description of crop names CropDescription.

    Data Source Type Area Task Paper Code
    Farmer's Declarations Shapefile Denmark Crop Monitoring - -
  • Land parcel Identification System (LPIS) - Luxembourg
    This dataset contatins agricultural and wine-growing parcels used as a basis for declarations within the framework of the common agricultural policy.

    Data Source Type Area Task Paper Code
    Farmer's Declarations GML Luxembourg Vineyard Mapping - -
  • DWD_RECENT
    DWD Climate Data Center (CDC): Phenological observations of crops from sowing to harvest, in Germany. The temporal coverage is rolling, with a window of 500 days (ending always yesterday), and the crops of interest are: meadows, winter wheat, winter rye, winter barley, winter oilseed rape, summer wheat, spring barley, oat, sunflower, maize, beet, sugar beet, fodder beet. For more information click here.

    Data Source Type Area Task Paper Code
    Field Observations CSV files Germany Crop Phenology - -
  • DWD_ARCHIVE
    DWD Climate Data Center (CDC): Historical phenological observations of crops from sowing to harvest, in Germany. It contatins data from 1951-01-01 until 2017-12-31 for dozins of crops (meadows, winter wheat, winter rye, winter barley, winter oilseed rape, summer wheat, spring barley, oat, sunflower, maize, potato, early potato (pregerminated), early potato (non pregerminated), late potato, green bean, green pea, tomato, white cabbage, alfalfa, red clover, beet, sugar beet, fodder beet). For more information click here.

    Data Source Type Area Task Paper Code
    Field Observations CSV files Germany Crop Phenology - -

Land change

Analysis Ready Remote Sensing Data with labels

  • Onera Dataset
    The Onera Satellite Change Detection dataset addresses the issue of detecting changes between satellite images from different dates. It comprises 24 pairs of multispectral images taken from the Sentinel-2 satellites between 2015 and 2018. Locations are picked all over the world, in Brazil, USA, Europe, Middle-East and Asia. For each location, registered pairs of 13-band multispectral satellite images obtained by the Sentinel-2 satellites are provided. Images vary in spatial resolution between 10m, 20m and 60m. Pixel-level change ground truth is provided for 14 of the image pairs. The annotated changes focus on urban changes, such as new buildings or new roads. These data can be used for training and setting parameters of change detection algorithms. [Paper]GitHub GitHub2

Analysis Ready Remote Sensing Data without labels

In-situ & Ground-level datasets

Geo-referenced labels

Water quality

Analysis Ready Remote Sensing Data with labels

  • AquaSat
    AquaSat contains more than 600,000 matchups, covering 1984–2019, of ground-based total suspended sediment, dissolved organic carbon, chlorophyll a, and SDDSecchi disk depth measurements paired with spectral reflectance from Landsat 5, 7, and 8 collected within ±1 day of each other. To build AquaSat, the authors developed open source tools in R and Python and applied them to existing public data sets covering the contiguous United States, including the Water Quality Portal, LAGOS-NE, and the Landsat archive. [Paper] [GitHub]

  • A dataset of remote-sensed Forel-Ule Index for global inland waters during 2000–2018
    This dataset provides significant information on spatial and temporal changes of water colour for global large lakes from 2000–2018 based on MODIS observations. It will be valuable to studies in search of the drivers of global and regional lake colour change, and the interaction mechanisms between water colour, hydrological factors, climate change, and anthropogenic activities. [Paper]

Analysis Ready Remote Sensing Data without labels

In-situ & Ground-level datasets

Geo-referenced labels

Air quality

Analysis Ready Remote Sensing Data with labels

Analysis Ready Remote Sensing Data without labels

In-situ & Ground-level datasets

-Air Quality e-Reporting (AQ e-Reporting)
European air quality information reported by EEA member countries, including all EU Member States, as well as EEA cooperating and other reporting countries. The EEA’s air quality database consists of a multi-annual time series of air quality measurement data and calculated statistics for a number of air pollutants. It also contains meta-information on the monitoring networks involved, their stations and measurements, air quality modelling techniques, as well as air quality zones, assessment regimes, compliance attainments and air quality plans and programmes reported by the EU Member States and European Economic Area countries

Geo-referenced labels

-NO2 Air Pollution Data
With support from NASA, the Holloway Group at SAGE has developed a set of user-friendly dataset to support wider utilization of remote sensing data for air quality and health. This growing inventory of data includes: - Shapefiles of NO2 air pollution from satellite for use in GIS platforms, including the EPA’s EJSCREEN platform for environmental justice - 12 km x 12 km daily gridded data of NO2 air pollution from satellite for comparison with photochemical grid model output or other data sources

Moreover, this dataset contains daily gridded DOMINO NO2 data, zipped into monthly files. These data were generated from Level-2 satellite data (on swaths) and gridded to a 12 km x 12 km horizontal resolution over the continental United States using the Wisconsin Horizontal Interpolation Program for Satellites (WHIPS) for ease of comparison with photochemical grid model output. [Paper]

Other

Analysis Ready Remote Sensing Data with labels

  • Sen1Floods11
    A surface water dataset including raw Sentinel-1 imagery and classified permanent water and flood water. This dataset consists of 4,831 512x512 chips covering 120,406 km2 and spans all 14 biomes, 357 ecoregions, and 6 continents of the world across 11 flood events.
    [Paper] [GitHub]

  • Labeled SAR imagery dataset of ten geophysical phenomena from Sentinel-1 wave mode (TenGeoP-SARwv)
    The TenGeoP-SARwv dataset is established based on the acquisitions of Sentinel-1A wave mode (WV) in VV polarization. This dataset consists of more than 37,000 SAR vignettes divided into ten defined geophysical categories, including both oceanic and meteorologic features. These images cover the entire open ocean and are manually selected from Sentinel-1A WV acquisitions in 2016. For each image, only one prevalent geophysical phenomena with its prescribed signature and texture is selected for labeling. The SAR images are processed into a quick-look image provided in the formats of PNG and GeoTIFF as well as the associated labels. They are convenient for both visual inspection and machine-learning-based methods exploitation. [Paper]

  • VisDrone dataset
    From the description of the dataset repository: Drones, or general UAVs, equipped with cameras have been fast deployed to a wide range of applications, including agricultural, aerial photography, fast delivery, and surveillance. Consequently, automatic understanding of visual data collected from these platforms become highly demanding, which brings computer vision to drones more and more closely. We are excited to present a large-scale benchmark with carefully annotated ground-truth for various important computer vision tasks, named VisDrone, to make vision meet drones. The VisDrone2019 dataset is collected by the AISKYEYE team at Lab of Machine Learning and Data Mining , Tianjin University, China. The benchmark dataset consists of 288 video clips formed by 261,908 frames and 10,209 static images, captured by various drone-mounted cameras, covering a wide range of aspects including location (taken from 14 different cities separated by thousands of kilometers in China), environment (urban and country), objects (pedestrian, vehicles, bicycles, etc.), and density (sparse and crowded scenes). Note that, the dataset was collected using various drone platforms (i.e., drones with different models), in different scenarios, and under various weather and lighting conditions. These frames are manually annotated with more than 2.6 million bounding boxes of targets of frequent interests, such as pedestrians, cars, bicycles, and tricycles. Some important attributes including scene visibility, object class and occlusion, are also provided for better data utilization. [Paper] [GitHub]

  • AU-AIR Dataset
    AU-AIR dataset is the first multi-modal UAV dataset for object detection. It meets vision and robotics for UAVs having the multi-modal data from different on-board sensors, and pushes forward the development of computer vision and robotic algorithms targeted at autonomous aerial surveillance. AU-AIR has several features:

    • Object detection in aerial images

    • more than 2 hours raw videos

    • 32,823 labelled frames

    • 132,034 object instances

    • 8 object categories related to traffic surveillance

    • Frames are also labelled with time, GPS, IMU, altitude, linear velocities of the UAV [Paper] [GitHub]

Analysis Ready Remote Sensing Data without labels

In-situ & Ground-level datasets

Geo-referenced labels

Web Application / Websites with labelled data

  • Mapillary Street Level Images
    A web platform/application where crowdsourced map data and street level imagery are available to everyone. Computer vision is used to combine those images and create immersive street-level views. Among many other features, Mapillary offers:

    • A quite extended coverage for Europe
    • Integration with OpenStreetMap, ArcGIS tools, and HERE Map Creator
    • The ability to request imagery for areas that, either don’t already have images, or just to get a more recent version of them
    • Navigation in a Google Street View style for easy visual interpretation
    • Filter imagery by capture time
    • Filter imagery by the types of objects that appear in the images (not an extended list of agriculture-specific objects yet though - mainly focused on city infrastructure and traffic lights/signs for now)
  • Eden Library
    Eden Library is a collection of high value plant datasets embedding agricultural domain knowledge produced in an academic environment. Eden Library includes a wide range of agrifood datasets such as:

    • Plant pests
    • Plant diseases
    • Weeds
    • Healthy plants
      That were acquired using:
    • Various styles (Proximal, UAV upon request)
    • Various sensors (RGB, thermal, multispectral & hyperspectral upon request)
      [GitHub]
  • senseFly
    Explore how senseFly drone solutions are employed around the globe — from topographic mapping and site surveys to stockpile monitoring, crop scouting, earthworks, climate change research and much more. The main domains that are included in this dataset are:

    • Tactical Mapping
    • Surveying & Mapping
    • Mining, Quarries & Aggregates
    • Engineering & Construction
    • Agriculture
    • Environmental Monitoring
    • Humanitarian

European projects

Other Useful Data Collections

Contact

Acknowledgements

This work has been supported by the CALLISTO project which has been funded by EU's Horizon 2020 research and innovation programme under grant agreement No. 101004152.

Curated by the Beyond Center of EO Research and Satellite Remote Sensing, IAASARS, National Observatory of Athens

About

A list of datasets aiming to enable Artificial Intelligence applications that use Copernicus data.