Introduction

This document lists resources for performing deep learning (DL) on satellite imagery. To a lesser extent classical Machine learning (ML, e.g. random forests) are also discussed, as are classical image processing techniques.

Datasets

Warning satellite image files can be LARGE, even a small data set may comprise 50 GB of imagery

WorldView

A commercial satellite owned by DigitalGlobe
https://en.wikipedia.org/wiki/WorldView-3
0.3m PAN, 1.24 MS, 3.7m SWIR. Off-Nadir (stereo) available.
Owned by DigitalGlobe
Getting Started with SpaceNet
Dataset on AWS -> see this getting started notebook and this notebook on the off-Nadir dataset
cloud_optimized_geotif here used in the 3D modelling notebook here.
Package of utilities to assist working with the SpaceNet dataset.
WorldView cloud optimized geotiffs used in the 3D modelling notebook here.
For more Worldview imagery see Kaggle DSTL competition.

Sentinel

As part of the EU Copernicus program, multiple Sentinel satellites are capturing imagery -> see wikipedia.
13 bands, Spatial resolution of 10 m, 20 m and 60 m, 290 km swath, the temporal resolution is 5 days
awesome-sentinel - a curated list of awesome tools, tutorials and APIs related to data from the Copernicus Sentinel Satellites.
Sentinel-2 Cloud-Optimized GeoTIFFs and Sentinel-2 L2A 120m Mosaic
Open access data on GCP
Paid access via sentinel-hub and python-api.
Example loading sentinel data in a notebook
so2sat on Tensorflow datasets - So2Sat LCZ42 is a dataset consisting of co-registered synthetic aperture radar and multispectral optical image patches acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and the corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world.
eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples. Dataset and usage in EuroSAT: Land Use and Land Cover Classification with Sentinel-2, where a CNN achieves a classification accuracy 98.57%.
bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.
Jupyter Notebooks for working with Sentinel-5P Level 2 data stored on S3. The data can be browsed here
Sentinel NetCDF data
Analyzing Sentinel-2 satellite data in Python with Keras
Xarray backend to Copernicus Sentinel-1 satellite data products

Landsat

Long running US program -> see Wikipedia and read the official webpage
8 bands, 15 to 60 meters, 185km swath, the temporal resolution is 16 days
DECEMBER 2020: USGS publishes Landsat Collection 2 Dataset with 'significant geometric and radiometric improvements'. COG and STAC data format. Announcement and website. Beware data on Google and AWS (below) may be in different formats.
Landsat 4, 5, 7, and 8 imagery on Google, see the GCP bucket here, with Landsat 8 imagery in COG format analysed in this notebook
Landsat 8 imagery on AWS, with many tutorials and tools listed
https://github.com/kylebarron/landsat-mosaic-latest -> Auto-updating cloudless Landsat 8 mosaic from AWS SNS notifications
Visualise landsat imagery using Datashader
Landsat-mosaic-tiler -> The repo host all the code for landsatlive.live website and APIs.

Spacenet

Spacenet is an online hub for data, challenges, algorithms, and tools.
spacenet.ai website covering the series of SpaceNet challenges, lots of useful resources (blog, video and papers)
The SpaceNet 7 Multi-Temporal Urban Development Challenge: Dataset Release
SpaceNet - WorldView-3 article here, and semantic segmentation using Raster Vision

Planet

Planet’s high-resolution, analysis-ready mosaics of the world’s tropics, supported through Norway’s International Climate & Forests Initiative. BBC coverage

UC Merced

Land use classification dataset with 21 classes and 100 RGB TIFF images for each class
Each image measures 256x256 pixels with a pixel resolution of 1 foot
http://weegee.vision.ucmerced.edu/datasets/landuse.html
Available as a Tensorflow dataset -> https://www.tensorflow.org/datasets/catalog/uc_merced
Also available as a multi-label dataset

PatternNet

Land use classification dataset with 38 classes and 800 RGB JPG images for each class
https://sites.google.com/view/zhouwx/dataset?authuser=0
Publication: PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval

Kaggle

Kaggle hosts over 60 satellite image datasets, search results here. The kaggle blog is an interesting read.

Kaggle - Amazon from space - classification challenge

https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data
3-5 meter resolution GeoTIFF images from planet Dove satellite constellation
12 classes including - cloudy, primary + waterway etc
1st place winner interview - used 11 custom CNN
FastAI Multi-label image classification
Multi-Label Classification of Satellite Photos of the Amazon Rainforest

Kaggle - DSTL - segmentation challenge

https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection
Rating - medium, many good examples (see the Discussion as well as kernels), but as this competition was run a couple of years ago many examples use python 2
WorldView 3 - 45 satellite images covering 1km x 1km in both 3 (i.e. RGB) and 16-band (400nm - SWIR) images
10 Labelled classes include - Buildings, Road, Trees, Crops, Waterway, Vehicles
Interview with 1st place winner who used segmentation networks - 40+ models, each tweaked for particular target (e.g. roads, trees)
Deepsense 4th place solution

Kaggle - Airbus Ship Detection Challenge

https://www.kaggle.com/c/airbus-ship-detection/overview
Rating - medium, most solutions using deep-learning, many kernels, good example kernel
I believe there was a problem with this dataset, which led to many complaints that the competition was ruined

Kaggle - Draper - place images in order of time

https://www.kaggle.com/c/draper-satellite-image-chronology/data
Rating - hard. Not many useful kernels.
Images are grouped into sets of five, each of which have the same setId. Each image in a set was taken on a different day (but not necessarily at the same time each day). The images for each set cover approximately the same area but are not exactly aligned.
Kaggle interviews for entrants who used XGBOOST and a hybrid human/ML approach

Kaggle - Deepsat - classification challenge

Not satellite but airborne imagery. Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat Matlab format. JPEG?

Imagery source
Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three
Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.
Deep Gradient Boosted Learning article

Kaggle - Understanding Clouds from Satellite Images

In this challenge, you will build a model to classify cloud organization patterns from satellite images.

Kaggle - miscellaneous

https://www.kaggle.com/reubencpereira/spatial-data-repo -> Satellite + loan data
https://www.kaggle.com/towardsentropy/oil-storage-tanks -> Image data of industrial tanks with bounding box annotations, estimate tank fill % from shadows
https://www.kaggle.com/rhammell/ships-in-satellite-imagery -> Classify ships in San Franciso Bay using Planet satellite imagery
https://www.kaggle.com/rhammell/planesnet -> Detect aircraft in Planet satellite image chips

Tensorflow datasets

There are a number of remote sensing datasets
resisc45 - RESISC45 dataset is a publicly available benchmark for Remote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class.
eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples.
bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.

AWS datasets

Earth on AWS is the AWS equivalent of Google Earth Engine
Currently 27 satellite datasets on the Registry of Open Data on AWS

Microsoft

USBuildingFootprints -> computer generated building footprints in all 50 US states, GeoJSON format, generated using semantic segmentation
Checkout Microsofts Planetary Computer project

Google Earth Engine (GEE)

https://developers.google.com/earth-engine/
Various imagery and climate datasets, including Landsat & Sentinel imagery
Since there is a whole community around GEE I will not reproduce it here but point to awesome-google-earth-engine and list very select references relevant to deep learning
How to Use Google Earth Engine and Python API to Export Images to Roboflow -> to acquire training data
Reduce Satellite Image Resolution with Google Earth Engine -> a crucial step before applying machine learning to satellite imagery
ee-fastapi is a simple FastAPI web application for performing flood detection using Google Earth Engine in the backend.
How to Download High-Resolution Satellite Data for Anywhere on Earth

Radiant Earth

https://www.radiant.earth/
Datasets and also models on https://mlhub.earth/

FAIR1M ‘world’s largest satellite image database’

DEM (digital elevation maps)

Shuttle Radar Topography Mission: data - open access
Copernicus Digital Elevation Model (DEM) on S3, represents the surface of the Earth including buildings, infrastructure and vegetation. Data is provided as Cloud Optimized GeoTIFFs. link

Weather Datasets

UK met-odffice -> https://www.metoffice.gov.uk/datapoint
NASA (make request and emailed when ready) -> https://search.earthdata.nasa.gov
NOAA (requires BigQuery) -> https://www.kaggle.com/noaa/goes16/home
Time series weather data for several US cities -> https://www.kaggle.com/selfishgene/historical-hourly-weather-data

Time series datasets

BreizhCrops -> A Time Series Dataset for Crop Type Mapping

UAV & Drone datasets

Many on https://www.visualdata.io
AU-AIR dataset -> a multi-modal UAV dataset for object detection.
ERA -> A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos.
Aerial Maritime Drone Dataset
Stanford Drone Dataset
RetinaNet for pedestrian detection
Aerial Maritime Drone Dataset
EmergencyNet -> identify fire and other emergencies from a drone
OpenDroneMap -> generate maps, point clouds, 3D models and DEMs from drone, balloon or kite images.
Dataset of thermal and visible aerial images for multi-modal and multi-spectral image registration and fusion -> The dataset consists of 30 visible images and their metadata, 80 thermal images and their metadata, and a visible georeferenced orthoimage.
BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos -> TIR videos of humans and animals with several challenging scenarios like scale variations, background clutter due to thermal reflections, large camera rotations, and motion blur
ERA: A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos

Synthetic data

The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation
RarePlanes -> incorporates both real and synthetically generated satellite imagery including aircraft.
Checkout Microsoft AirSim, which is a simulator for drones, cars and more, built on Unreal Engine

Interesting deep learning projects

TorchSat

TorchSat is an open-source deep learning framework for satellite imagery analysis based on PyTorch

Raster Vision by Azavea

https://www.azavea.com/projects/raster-vision/
An open source Python framework for building computer vision models on aerial, satellite, and other large imagery sets.
Accessible through the Raster Foundry
Example use cases on open data

DeepNetsForEO (no activity since 2019)

https://github.com/nshaud/DeepNetsForEO
Uses SegNET for working on remote sensing images using deep learning.

Skynet-data (no activity since 2018)

https://github.com/developmentseed/skynet-data
Data pipeline for machine learning with OpenStreetMap

RoboSat (no longer maintained)

https://github.com/mapbox/robosat
Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
robosat-jupyter-notebook -> walks through all of the steps in an excellent blog post on the Robosat feature extraction and machine learning pipeline.
Note there is/was fork of Robosat, originally named RoboSat.pink, and subsequently neat-EO.pink although this appears to be dead/archived

DeepOSM (no activity since 2017)

https://github.com/trailbehind/DeepOSM
Train a deep learning net with OpenStreetMap features and satellite imagery.

Techniques

This section explores the different deep and machine learning techniques people are applying to common problems in satellite imagery analysis. Classification problems are the most simply addressed via DL, object detection is harder, and cloud detection harder still (niche interest). Note that almost all imagery data on the internet is in RGB format, and common techniques designed for working with this 3 band imagery may fail or need significant adaptation to work with multiband data (e.g. 13-band Sentinel 2).

Land classification - RGB data

Assign a label to an image, e.g. this is an image of a forest. With RGB imagery 'off the shelf' neural net architectures can be applied

Land classification using a simple sklearn cluster algorithm or deep learning.
Land Use Classification using Convolutional Neural Network in Keras
Sea-Land segmentation using DL
A U-net based on Tensorflow for objection detection (or segmentation) of satellite images - DSTL dataset but python 2.7
What’s growing there? Using eo-learn and fastai to identify crops from multi-spectral remote sensing data (Sentinel 2)
FastAI Multi-label image classification
Land use classification using Keras and UC Merced dataset
Detecting Informal Settlements from Satellite Imagery using fine-tuning of ResNet-50 classifier with repo
Land use classification of UC Merced dataset using Keras or alternatively fastai
Water Detection in High Resolution Satellite Images using the waterdetect python package -> The main idea is to combine water indexes (NDWI, MNDWI, etc.) with reflectance bands (NIR, SWIR, etc.) into an automated clustering process
Contrastive Sensor Fusion -> Code implementing Contrastive Sensor Fusion, an approach for unsupervised learning of multi-sensor representations targeted at remote sensing imagery.
Codebase for land cover classification with U-Net
Multi-Label Classification of Satellite Photos of the Amazon Rainforest -> uses the Planet dataset & TF 2 & Keras
UrbanLandUse -> This repository contains a comprehensive set of instructions for creating and applying ML models that characterize land use / land cover (LULC) in urban areas.
Land cover classification of Sundarbans satellite imagery using K-Nearest Neighbor(K-NNC), Support Vector Machine (SVM), and Gradient Boosting classification algorithms with Python
Ground Truth Labeling of Satellite Imagery using K-Means Clustering with Python
Deep Learning for Land Cover Classification of Satellite Imagery Using Python
Multi-label Land Cover Classification with Deep Learning using the redesigned Multi-label UC Merced dataset with 17 land cover classes

Land classification - Hyperspectral data

Custom neural net architectures are required for this high dimensional imagery

hyperspectral_deeplearning_review -> Code of December 2019 paper "Deep Learning Classifiers for Hyperspectral Imaging: A Review"
Deep Learning-Based Classification of Hyperspectral Data
AutoEncoders for Land Cover Classification of Hyperspectral Images -> An autoencoder nerual net is used to reduce 103 band data to 60 features (dimensionality reduction), keras
Tree species classification from from airborne LiDAR and hyperspectral data using 3D convolutional neural networks
hyperspectral-autoencoders -> Tools for training and using unsupervised autoencoders and supervised deep learning classifiers for hyperspectral data, built on tensorflow. Autoencoders are unsupervised neural networks that are useful for a range of applications such as unsupervised feature learning and dimensionality reduction.
Applying Deep Learning on Satellite Imagery Classification -> using EuroSAT dataset of RGB and multi spectral covering 13 spectral bands, with repo
Land Cover Classification of Satellite Imagery using Convolutional Neural Networks using Keras and a dataset captured over Salinas Valley, California

Semantic segmentation

Whilst classification will assign a label to a whole image, semantic segmentation will assign a label to each pixel

Instance segmentation with keras - links to satellite examples
Semantic Segmentation on Aerial Images using fastai
https://github.com/Paulymorphous/Road-Segmentation
UNSOAT used fast.ai to train a Unet to perform semantic segmentation on satellite imageries to detect water - paper + notebook, accuracy 0.97, precision 0.91, recall 0.92.
Identification of roads and highways using Sentinel-2 imagery (10m) super-resolved using the SENX4 model up to x4 the initial spatial resolution (2.5m)
find-unauthorized-constructions-using-aerial-photography -> U-Net & Keras
WildFireDetection -> Using U-Net Model to Detect Wildfire from Satellite Imagery, with streamlit UI
Pixel level segmentation on Azure
DigitalGlobe article - they use a combination classical techniques (masks, erodes) to reduce the search space (identifying water via NDWI which requires SWIR) then apply a binary DL classifier on candidate regions of interest. They deploy the final algo as a task on their GBDX platform. They propose that in the future an R-CNN may be suitable for the whole process.
Сrор field boundary detection: approaches overview and main challenges
A Practical Method for High-Resolution Burned Area Monitoring Using Sentinel-2 and VIIRS with code
instance-segmentation-maskrcnn -> Instance segmentation of center pivot irrigation system in Brazil using Landsat images and Convolutional Neural Network

Change detection

Monitor water levels, coast lines, size of urban areas, wildfire damage. Note, clouds change often too..!

awesome-remote-sensing-change-detection
Using PCA (python 2, requires updating) -> https://appliedmachinelearning.blog/2017/11/25/unsupervised-changed-detection-in-multi-temporal-satellite-images-using-pca-k-means-python-code/
Using CNN -> https://github.com/vbhavank/Unstructured-change-detection-using-CNN
Siamese neural network to detect changes in aerial images
https://www.spaceknow.com/
LANDSAT Time Series Analysis for Multi-temporal Land Cover Classification using Random Forest
Change Detection in 3D: Generating Digital Elevation Models from Dove Imagery
Change Detection in Hyperspectral Images Using Recurrent 3D Fully Convolutional Networks
QGIS 2 plugin for applying change detection algorithms on high resolution satellite imagery
Change-Detection-Review -> A review of change detection methods, including codes and open data sets for deep learning.
Flood Detection and Monitoring using Satellite Imagery with Python
LamboiseNet -> Master thesis about change detection in satellite imagery using Deep Learning

Object detection

A good introduction to the challenge of performing object detection on aerial imagery is given in this paper. In summary, images are large and objects may comprise only a few pixels, easily confused with random features in background. An example task is detecting boats on the ocean, which should be simpler than land based detection owing to the relatively blank background in images, but is still challenging.

Intro articles here and here.
Super-Resolution and Object Detection -> Super-resolution is a relatively inexpensive enhancement that can improve object detection performance
Anomaly Detection on Mars using a GAN
Tackling the Small Object Problem in Object Detection
Satellite Imagery Multiscale Rapid Detection with Windowed Networks (SIMRDWN) -> combines some of the leading object detection algorithms into a unified framework designed to detect objects both large and small in overhead imagery
Several useful articles on awesome-tiny-object-detection
Challenges with SpaceNet 4 off-nadir satellite imagery: Look angle and target azimuth angle -> building prediction in images taken at nearly identical look angles — for example, 29 and 30 degrees — produced radically different performance scores.
YOLTv4 -> YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitrarily large images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks. Read Announcing YOLTv4: Improved Satellite Imagery Object Detection
Benchmarks for Object Detection in Aerial Images -> codebase created to build benchmarks for object detection in aerial images

Object detection - buildings

Machine Learning For Rooftop Detection and Solar Panel Installment discusses tiling large images and generating annotations from OSM data. Features of the roofs were calculated using a combination of contour detection and classification
Building footprint detection with fastai on the challenging SpaceNet7 dataset
DeepSolar is a deep learning framework that analyzes satellite imagery to identify the GPS locations and sizes of solar panels
Building Extraction with YOLT2 and SpaceNet Data
Segmentation of buildings on kaggle
Identifying Buildings in Satellite Images with Machine Learning and Quilt -> NDVI & edge detection via gaussian blur as features, fed to TPOT for training with labels from OpenStreetMap, modelled as a two class problem, “Buildings” and “Nature”.
AIcrowd dataset of building outlines -> 300x300 pixel RGB images with annotations in MS-COCO format

Object detection - boats, planes & vehicles

Detecting Ships in Satellite Imagery using the Planet dataset and Keras
Truck Detection with Sentinel-2 during COVID-19 crisis -> moving objects in Sentinel-2 data causes a specific reflectance relationship in the RGB, which looks like a rainbow, and serves as a marker for trucks. Improve accuracy by only analysing roads.
Planet use non DL felzenszwalb algorithm to detect ships
EESRGAN -> Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

Object detection - trees & green areas

DeepForest is a python package for training and predicting individual tree crowns from airborne RGB imagery
Official repository for the "Identifying trees on satellite images" challenge from Omdena
Counting-Trees-using-Satellite-Images -> create an inventory of incoming and outgoing trees for an annual tree inspections, uses keras
2020 Nature paper - An unexpectedly large count of trees in the West African Sahara and Sahel -> tree detection framework based on U-Net & tensorflow 2 with code here
Detecting solar panels from satellite imagery
Find sports fields using Mask R-CNN and overlay on open-street-map
DeepForest -> Python Package for Tree Crown Detection in Airborne RGB imagery

Cloud detection

From this article on sentinelhub there are three popular classical algorithms that detects thresholds in multiple bands in order to identify clouds. In the same article they propose using semantic segmentation combined with a CNN for a cloud classifier (excellent review paper here), but state that this requires too much compute resources.
This article compares a number of ML algorithms, random forests, stochastic gradient descent, support vector machines, Bayesian method.
Segmentation of Clouds in Satellite Images Using Deep Learning -> a U-Net is employed to interpret and extract the information embedded in the satellite images in a multi-channel fashion, and finally output a pixel-wise mask indicating the existence of cloud.
Cloud Detection in Satellite Imagery compares FPN and CheapLab architectures on Sentinel-2 L1C and L2A imagery

Wealth and economic activity measurement

The goal is to predict economic activity from satellite imagery rather than conducting labour intensive ground surveys

Using publicly available satellite imagery and deep learning to understand economic well-being in Africa, Nature Comms 22 May 2020 -> Used CNN on Ladsat imagery (night & day) to predict asset wealth of African villages
Combining Satellite Imagery and machine learning to predict poverty -> review article
Measuring Human and Economic Activity from Satellite Imagery to Support City-Scale Decision-Making during COVID-19 Pandemic
Predicting Food Security Outcomes Using CNNs for Satellite Tasking
Crop yield Prediction with Deep Learning -> The necessary code for the paper Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data, AAAI 2017 (Best Student Paper Award in Computational Sustainability Track).
https://github.com/taspinar/sidl/blob/master/notebooks/2_Detecting_road_and_roadtypes_in_sattelite_images.ipynb
Measuring the Impacts of Poverty Alleviation Programs with Satellite Imagery and Deep Learning
Traffic density estimation as a regression problem
Crop Yield Prediction Using Deep Neural Networks and LSTM and Building a Crop Yield Prediction App in Senegal Using Satellite Imagery and Jupyter
Advanced Deep Learning Techniques for Predicting Maize Crop Yield using Sentinel-2 Satellite Imagery

Super-resolution

Super-resolution imaging is a class of techniques that enhance the resolution of an imaging system, and can be applied as a pre-processing step to improve the detection of small objects. For an introduction to this topic read this excellent article

https://medium.com/the-downlinq/super-resolution-on-satellite-imagery-using-deep-learning-part-1-ec5c5cd3cd2 -> Nov 2016 blog post by CosmiQ Works with a nice introduction to the topic. Proposes and demonstrates a new architecture with perturbation layers with practical guidance on the methodology and code. Three part series
Super Resolution for Satellite Imagery - srcnn repo
TensorFlow implementation of "Accurate Image Super-Resolution Using Very Deep Convolutional Networks" adapted for working with geospatial data
Random Forest Super-Resolution (RFSR repo) including sample data
Super-Resolution (python) Utilities for managing large satellite images
Enhancing Sentinel 2 images by combining Deep Image Prior and Decrappify. Repo for deep-image-prior and article on decrappify
The keras docs have a great tutorial - Image Super-Resolution using an Efficient Sub-Pixel CNN
HighRes-net -> Pytorch implementation of HighRes-net, a neural network for multi-frame super-resolution, trained and tested on the European Space Agency’s Kelvin competition
super-resolution-using-gan -> Super-Resolution of Sentinel-2 Using Generative Adversarial Networks
Super-resolution of Multispectral Satellite Images Using Convolutional Neural Networks with paper
Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network -> enhanced super-resolution GAN (ESRGAN)
pytorch-enhance -> Library of Image Super-Resolution Models, Datasets, and Metrics for Benchmarking or Pretrained Use

Image-to-image translation using GANS

Generative Adversarial Networks, or GANS, can be used to translate images, e.g. from SAR to RGB.

How to Develop a Pix2Pix GAN for Image-to-Image Translation -> how to develop a Pix2Pix model for translating satellite photographs to Google map images. A good intro to GANS
SAR to RGB Translation using CycleGAN -> uses a CycleGAN model in the ArcGIS API for Python
A growing problem of ‘deepfake geography’: How AI falsifies satellite images

SAR & Denoising

I group these together since I most often see denoising in the context of SAR imagery.

Convolutional autoencoder network can be employed to image denoising, read about this on the Keras blog
Removing speckle noise from Sentinel-1 SAR using a CNN
A dataset which is specifically made for deep learning on SAR and optical imagery is the SEN1-2 dataset, which contains corresponding patch pairs of Sentinel 1 (VV) and 2 (RGB) data. It is the largest manually curated dataset of S1 and S2 products, with corresponding labels for land use/land cover mapping, SAR-optical fusion, segmentation and classification tasks. Data: https://mediatum.ub.tum.de/1474000
so2sat on Tensorflow datasets -> So2Sat LCZ42 is a dataset consisting of co-registered synthetic aperture radar and multispectral optical image patches acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and the corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world.
You do not need clean images for SAR despeckling with deep learning -> How Speckle2Void learned to stop worrying and love the noise
PySAR - InSAR (Interferometric Synthetic Aperture Radar) timeseries analysis in python
Synthetic Aperture Radar (SAR) Analysis With Clarifai
Labeled SAR imagery dataset of ten geophysical phenomena from Sentinel-1 wave mode consists of more than 37,000 SAR vignettes divided into ten defined geophysical categories

ML best practice & general techniques

4-ways-to-improve-class-imbalance discusses the pros and cons of several rebalancing techniques, applied to an aerial dataset. Reason to read: models can reach an accuracy ceiling where majority classes are easily predicted but minority classes poorly predicted. Overall model accuracy may not improve until steps are taken to account for class imbalance.
Seven steps towards a satellite imagery dataset
Implementing Transfer Learning from RGB to Multi-channel Imagery -> takes a resnet50 model pre-trained on an input of 224x224 pixels with 3 channels (RGB) and updates it for a new input of 480x400 pixels and 15 channels (12 new + RGB) using keras
How to implement augmentations for Multispectral Satellite Images Segmentation using Fastai-v2 and Albumentations
Principal Component Analysis: In-depth understanding through image visualization applied to Landsat TM images, with repo
Leveraging Geolocation Data for Machine Learning: Essential Techniques -> A Gentle Guide to Feature Engineering and Visualization with Geospatial data, in Plain English
3 Tips to Optimize Your Machine Learning Project for Data Labeling
Image Classification Labeling: Single Class versus Multiple Class Projects
Labeling Satellite Imagery for Machine Learning
Image Augmentations for Aerial Datasets

Miscellaneous (generally) non ML Techniques

Pansharpening

Image fusion of low res multispectral with high res pan band.

Several algorithms described in the ArcGIS docs, with the simplest being taking the mean of the pan and RGB pixel value.
Does not require DL, classical algos suffice, see this notebook and this kaggle kernel
https://github.com/mapbox/rio-pansharpen

NVDI - vegetation index

Simple band math ndvi = np.true_divide((ir - r), (ir + r)) but challenging due to the size of the imagery.
Example notebook local
Landsat data in cloud optimised (COG) format analysed for NVDI with medium article here.
Visualise water loss with Holoviews

Image registration

Image registration is the process of transforming different sets of data into one coordinate system. Typical use is overlapping images taken at different times or with different cameras.

Wikipedia article on registration -> register for change detection or image stitching
Traditional approach -> define control points, employ RANSAC algorithm
Phase correlation is used to estimate the translation between two images with sub-pixel accuracy. Can be used for accurate registration of low resolution imagery onto high resolution imagery, or to register a sub-image on a full image -> Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects.
cnn-registration -> A image registration method using convolutional neural network features written in Python2, Tensorflow 1.5

Terrain mapping, Lidar & DEMs

Measure surface contours.

Wikipedia DEM article and phase correlation article
Intro to depth from stereo
Map terrain from stereo images to produce a digital elevation model (DEM) -> high resolution & paired images required, typically 0.3 m, e.g. Worldview or GeoEye.
Process of creating a DEM here and here.
ArcGIS can generate DEMs from stereo images
https://github.com/MISS3D/s2p -> produces elevation models from images taken by high resolution optical satellites -> demo code on https://gfacciol.github.io/IS18/
Automatic 3D Reconstruction from Multi-Date Satellite Images
Semi-global matching with neural networks
Predict the fate of glaciers
monodepth - Unsupervised single image depth prediction with CNNs
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
Terrain and hydrological analysis based on LiDAR-derived digital elevation models (DEM) - Python package
Phase correlation in scikit-image
s2p -> a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos
The Mapbox API provides images and elevation maps, article here
Reconstructing 3D buildings from aerial LiDAR with Mask R-CNN

Image formats, data management and catalogues

GeoServer -> an open source server for sharing geospatial data
Open Data Cube - serve up cubes of data https://www.opendatacube.org/
https://terria.io/ for pretty catalogues
Remote pixel
Sentinel-hub eo-browser
Large datasets may come in HDF5 format, can view with -> https://www.hdfgroup.org/downloads/hdfview/
Climate data is often in netcdf format, which can be opened using xarray
The xarray docs list a number of ways that data can be stored and loaded.
TileDB -> a 'Universal Data Engine' to store, analyze and share any data (beyond tables), with any API or tool (beyond SQL) at planet-scale (beyond clusters), open source and managed options. Recently hiring to work with xarray, dask, netCDF and cloud native storage
BigVector database -> A fully-managed, highly-scalable, and cost-effective database for vectors. Vectorize structured data or orbital imagery and discover new insights
Read about Serverless PostGIS on AWS Aurora
Hub -> The fastest way to store, access & manage datasets with version-control for PyTorch/TensorFlow. Works locally or on any cloud. Read Faster Machine Learning Using Hub by Activeloop: A code walkthrough of using the hub package for satellite imagery
A Comparison of Spatial Functions: PostGIS, Athena, PrestoDB, BigQuery vs RedShift

Cloud Optimised GeoTiff (COG)

A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file with an internal organization that enables more efficient workflows on the cloud. In particular they support HTTP range requests, enabling downloading of specific tiles rather than the full file. COGs work normally in GIS software such as QGIS, but are larger than regular GeoTIFFs

https://www.cogeo.org/
cog-best-practices
rio-cogeo -> Cloud Optimized GeoTIFF (COG) creation and validation plugin for Rasterio.
aiocogeo -> Asynchronous cogeotiff reader (python asyncio)
Landsat data in cloud optimised (COG) format analysed for NVDI with medium article Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo.
Working with COGS and STAC in python using geemap
Load, Experiment, and Download Cloud Optimized Geotiffs (COG) using Python with Google Colab -> short read which covers finding COGS, opening with Rasterio and doing some basic manipulations, all in a Colab Notebook.
Exploring USGS Terrain Data in COG format using hvPlot -> local COG from public AWS bucket, open with rioxarray, visualise with hvplot. See the Jupyter notebook
aws-lambda-docker-rasterio -> AWS Lambda Container Image with Python Rasterio for querying Cloud Optimised GeoTiffs. See this presentation
cogbeam -> a python based Apache Beam pipeline, optimized for Google Cloud Dataflow, which aims to expedite the conversion of traditional GeoTIFFs into COGs
cogserver -> Expose a GDAL file as a HTTP accessible on-the-fly COG
Displaying a gridded dataset on a web-based map - Step by step guide for displaying large GeoTIFFs, using Holoviews, Bokeh, and Datashader

STAC - SpatioTemporal Asset Catalog specification

The STAC specification provides a common metadata specification, API, and catalog format to describe geospatial assets, so they can more easily indexed and discovered. The aim is that the catalogue is crawlable so it can be indexed by a search engine and make imagery discoverable, without requiring yet another API interface. A good place to start is to view the Planet Disaster Data catalogue which has the catalogue source on Github and uses the stac-browser

Spec at https://github.com/radiantearth/stac-spec
Getting Started with STAC APIs intro article
SpatioTemporal Asset Catalog API specification -> an API to make geospatial assets openly searchable and crawlable
stacindex -> STAC Catalogs, Collections, APIs, Software and Tools
Several useful repos on https://github.com/sat-utils
Intake-STAC -> Intake-STAC provides an opinionated way for users to load Assets from STAC catalogs into the scientific Python ecosystem. It uses the intake-xarray plugin and supports several file formats including GeoTIFF, netCDF, GRIB, and OpenDAP.
sat-utils/sat-search -> Sat-search is a Python 3 library and a command line tool for discovering and downloading publicly available satellite imagery using STAC compliant API
franklin -> A STAC/OGC API Features Web Service focused on ease-of-use for end-users.
stacframes -> A Python library for working with STAC Catalogs via Pandas DataFrames
sat-api-pg -> A Postgres backed STAC API
stactools -> Command line utility and Python library for STAC
pystac -> Python library for working with any STAC Catalog
STAC Examples for Nightlights data -> minimal example STAC implementation for the Light Every Night dataset of all VIIRS DNB and DMSP-OLS nighttime satellite data
stackstac -> Turn a STAC catalog into a dask-based xarray

State of the art

What are companies doing?

A serverless pipeline appears to be where companies are headed for routine compute tasks. Checkout process Satellite data using AWS Lambda functions. Just beware of runtime limits and cold starts
Traditional data formats aren't designed for processing, so new standards are developing such as COGS
Google provide training on how to use Apache Spark on Google Cloud Dataproc to distribute a computationally intensive (satellite) image processing task onto a cluster of machines -> https://google.qwiklabs.com/focuses/5834?parent=catalog
Read about Planet on Google and also how Airbus use Google as the backend for their OneAtlas data portal

Online platforms for Geo analysis

This article discusses some of the available platforms
Pangeo -> There is no single software package called “pangeo”; rather, the Pangeo project serves as a coordination point between scientists, software, and computing infrastructure. Includes open source resources for parallel processing using Dask and Xarray. Pangeo recently announced their 2.0 goals: pivoting away from directly operating cloud-based JupyterHubs, and towards eductaion and research
Airbus Sandbox -> will provide access to imagery
Descartes Labs -> access to EO imagery from a variety of providers via python API
DigitalGlobe have a cloud hosted Jupyter notebook platform called GBDX. Cloud hosting means they can guarantee the infrastructure supports their algorithms, and they appear to be close/closer to deploying DL.
Planet have a Jupyter notebook platform which can be deployed locally.
jupyteo.com -> hosted Jupyter environment with many features for working with EO data
eurodatacube.com -> data & platform for EO analytics in Jupyter env, paid
Unfolded Studio -> next generation geospatial analytics and visualization platform building on open source geospatial technologies including kepler.gl, deck.gl and H3. Processing is down browser side enabling excellent performance. Rasters support added April 2021
up42 is a developer platform and marketplace, offering all the building blocks for powerful, scalable geospatial products
Microsoft Planetary Computer -> direct Google Earth Engine competitor in the making?

Free online computing resources

Generally a GPU is required for DL, and this section lists a couple of free Jupyter environments with GPU available. There is a good overview of online Jupyter development environments on the fast.ai site. I personally use Colab with data hosted on Google Drive

Google Colab

Collaboratory notebooks with GPU as a backend for free for 12 hours at a time. Note that the GPU may be shared with other users, so if you aren't getting good performance try reloading.
Also a pro tier for $10 a month -> https://colab.research.google.com/signup
Tensorflow, pytorch & fast.ai available but you may need to update them

Kaggle - also Google!

Free to use
GPU Kernels - may run for 1 hour
Tensorflow, pytorch & fast.ai available but you may need to update them
Advantage that many datasets are already available

Production

For an overview on serving deep learning models checkout Practical-Deep-Learning-on-the-Cloud.

Rest API on dedicated server

A conceptually simple approach to serving up deep learning model inference code is to wrap it in a rest API. That can be implemented in python (flask or FastAPI), and hosted on a dedicated server e.g. EC2 instance

Basic API: https://blog.keras.io/building-a-simple-keras-deep-learning-rest-api.html with code here
Advanced API with request queuing: https://www.pyimagesearch.com/2018/01/29/scalable-keras-deep-learning-rest-api/

Framework specific serving options

Tensorflow and pytorch specific serving

AWS

Host your data on S3 and metadata in a relational db such as postgres
For batch processing use Batch to run python scripts. Break out units of processing into Lambda functions. Note that lambda may not be a particularly quick solution for deep learning applications, since you do not have the option to batch inference on a GPU. There is also a hard runtime limit of 15 minutes, and creating a container with all the required dependencies can be a challenge. To get started read Using container images to run PyTorch models in AWS Lambda
Use Step functions to orchestrate data pipelines on batch and lambda. If this is too limited or you want to write pipelines in python (rather than json used by step functions) checkout Prefect
Sagemaker is a hosted Jupyter environment for training and deployment of ML models.
Deep learning AMIs are EC2 instances with deep learning frameworks preinstalled. They do require more setup from the user than Sagemaker but in return allow access to the underlying hardware, which makes debugging issues more straightforward. There is a good guide to setting up your AMI instance on the Keras blog
Rekognition custom labels is a 'no code' annotation, training and inferencing service. Read Training models using Satellite (Sentinel-2) imagery on Amazon Rekognition Custom Labels. For a comparison with Azure and Google alternatives read this article
When developing you will definitely want to use boto3 and probably aws-data-wrangler
For managing infrastructure use Terraform. Alternatively if you wish to use TypeScript, JavaScript, Python, Java, or C# checkout AWS CDK, although I found relatively few examples to get going using python
AWS Ground Station now supports data delivery to Amazon S3

chip-n-scale-queue-arranger by developmentseed

https://github.com/developmentseed/chip-n-scale-queue-arranger
an orchestration pipeline for running machine learning inference at scale
Supports fast.ai models

Useful paid software

ArcGIS -> mapping and analytics software, with both local and cloud hosted options. Checkout Geospatial deep learning with arcgis.learn. It appears ArcGIS are using fastai for their deep learning backend. ArcGIS Jupyter Notebooks in ArcGIS Enterprise are built to run big data analysis, deep learning models, and dynamic visualization tools.

Useful open source software

A note on licensing: The two general types of licenses for open source are copyleft and permissive. Copyleft requires that subsequent derived software products also carry the license forward, e.g. the GNU Public License (GNU GPLv3). For permissive, options to modify and use the code as one please are more open, e.g. MIT & Apache 2. Checkout choosealicense.com/

QGIS- Create, edit, visualise, analyse and publish geospatial information. Python scripting and plugins. Open source alternative to ArcGIS.
Orfeo toolbox - remote sensing toolbox with python API (just a wrapper to the C code). Do activites such as pansharpening, ortho-rectification, image registration, image segmentation & classification. Not much documentation.
QUICK TERRAIN READER - view DEMS, Windows
dl-satellite-docker -> docker files for geospatial analysis, including tensorflow, pytorch, gdal, xgboost...
AIDE V2 - Tools for detecting wildlife in aerial images using active learning
Land Cover Mapping web app from Microsoft
Solaris -> An open source ML pipeline for overhead imagery by CosmiQ Works, similar to Rastervision but with some unique very vool features
openSAR -> Synthetic Aperture Radar (SAR) Tools and Documents from Earth Big Data LLC (http://earthbigdata.com/)
qhub -> QHub enables teams to build and maintain a cost effective and scalable compute/data science platform in the cloud.
imagej -> a very versatile image viewer and processing program
Geo Data Viewer extension for VSCode which enables opening and viewing various geo data formats with nice visualisations
Datasette is a tool for exploring and publishing data as an interactive website and accompanying API, with SQLite backend. Various plugins extend its functionality, for example to allow displaying geospatial info, render images (useful for thumbnails), and add user authentication.
Photoprism is a privately hosted app for browsing, organizing, and sharing your photo collection, with support for tiffs
dbeaver is a free universal database tool and SQL client with geospatial features
Grafana can be used to make interactive dashboards, checkout this example showing Point data. Note there is an AWS managed service for Grafana
litestream -> Continuously stream SQLite changes to S3-compatible storage

GDAL & Rasterio

So improtant this pair gets their own section. GDAL is THE command line tool for reading and writing raster and vector geospatial data formats. If you are using python you will probably want to use Rasterio which provides a pythonic wrapper for GDAL
GDAL and on twitter
GDAL is a dependency of Rasterio and can be difficult to build and install. I recommend using conda, brew (on OSX) or docker in these situations
GDAL docker quickstart: docker pull osgeo/gdal then docker run --rm -v $(pwd):/data/ osgeo/gdal gdalinfo /data/cog.tiff
Even Rouault maintains GDAL, please consider sponsoring him
Rasterio -> reads and writes GeoTIFF and other raster formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON. There are a variety of plugins that extend Rasterio functionality.
rio-cogeo -> Cloud Optimized GeoTIFF (COG) creation and validation plugin for Rasterio.
rioxarray -> geospatial xarray extension powered by rasterio
aws-lambda-docker-rasterio -> AWS Lambda Container Image with Python Rasterio for querying Cloud Optimised GeoTiffs. See this presentation
godal -> golang wrapper for GDAL
Write rasterio to xarray
Loam: A Client-Side GDAL Wrapper for Javascript

Python general utilities

PyShp -> The Python Shapefile Library (PyShp) reads and writes ESRI Shapefiles in pure Python
s2p -> a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos
EarthPy -> A set of helper functions to make working with spatial data in open source tools easier. readExploratory Data Analysis (EDA) on Satellite Imagery Using EarthPy
pygeometa -> provides a lightweight and Pythonic approach for users to easily create geospatial metadata in standards-based formats using simple configuration files
pesto -> PESTO is designed to ease the process of packaging a Python algorithm as a processing web service into a docker image. It contains shell tools to generate all the boiler plate to build an OpenAPI processing web service compliant with the Geoprocessing-API. By Airbus Defence And Space
GEOS -> Google Earth Overlay Server (GEOS) is a python-based server for creating Google Earth overlays of tiled maps. Your can also display maps in the web browser, measure distances and print maps as high-quality PDF’s.
GeoDjango intends to be a world-class geographic Web framework. Its goal is to make it as easy as possible to build GIS Web applications and harness the power of spatially enabled data. Some features of GDAL are supported.
rasterstats -> summarize geospatial raster datasets based on vector geometries
turfpy -> a Python library for performing geospatial data analysis which reimplements turf.js
image-similarity-measures -> Implementation of eight evaluation metrics to access the similarity between two images. Blog post here

Python low level numerical & data formats

xarray -> N-D labeled arrays and datasets. Read Handling multi-temporal satellite images with Xarray. Checkout xarray_leaflet for tiled map plotting
xarray-spatial -> Fast, Accurate Python library for Raster Operations. Implements algorithms using Numba and Dask, free of GDAL
Geowombat -> geo-utilities applied to air- and space-borne imagery, uses Rasterio, Xarray and Dask for I/O and distributed computing with named coordinates
NumpyTiles -> a specification for providing multiband full-bit depth raster data in the browser
Zarr -> Zarr is a format for the storage of chunked, compressed, N-dimensional arrays. Zarr depends on NumPy

Python image handling and manipulation

Pillow is the Python Imaging Library -> this will be your go-to package for image manipulation in python
opencv-python is pre-built CPU-only OpenCV packages for Python
kornia is a differentiable computer vision library for PyTorch, like openCV but on the GPU. Perform image transformations, epipolar geometry, depth estimation, and low-level image processing such as filtering and edge detection that operate directly on tensors.
tifffile -> Read and write TIFF files
xtiff -> A small Python 3 library for writing multi-channel TIFF stacks
geotiff -> A noGDAL tool for reading and writing geotiff files
image_slicer -> Split images into tiles. Join the tiles back together.
tiler -> split images into tiles and merge tiles into a large image
felicette -> Satellite imagery for dummies. Generate JPEG earth imagery from coordinates/location name with publicly available satellite data.
imagehash -> Image hashes tell whether two images look nearly identical.
xbatcher -> Xbatcher is a small library for iterating xarray DataArrays in batches. The goal is to make it easy to feed xarray datasets to machine learning libraries such as Keras.
fake-geo-images -> A module to programmatically create geotiff images which can be used for unit tests

Python deep learning toolsets

torchvision-enhance -> Enhance PyTorch vision for semantic segmentation, multi-channel images and TIF file
DeepHyperX -> A Python/pytorch tool to perform deep learning experiments on various hyperspectral datasets.
image-super-resolution -> Super-scale your images and run experiments with Residual Dense and Adversarial Networks.

Python data discover and ingestion

landsat_ingestor -> Scripts and other artifacts for landsat data ingestion into Amazon public hosting
satpy - a python library for reading and manipulating meteorological remote sensing data and writing it to various image and data file formats

Python graphing and visualisation

hvplot -> A high-level plotting API for the PyData ecosystem built on HoloViews. Allows overlaying data on map tiles, see Exploring USGS Terrain Data in COG format using hvPlot
Pyviz examples include several interesting geospatial visualisations
napari -> napari is a fast, interactive, multi-dimensional image viewer for Python. It’s designed for browsing, annotating, and analyzing large multi-dimensional images. By integrating closely with the Python ecosystem, napari can be easily coupled to leading machine learning and image analysis tools. Note that to view a 3GB COG I had to install the napari-tifffile-reader plugin.
pixel-adjust -> Interactively select and adjust specific pixels or regions within a single-band raster. Built with rasterio, matplotlib, and panel.
Plotly Dash can be used for making interactive dashboards
folium -> a python wrapper to the excellent leaflet.js which makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. Also checkout the streamlit-folium component for adding folium maps to your streamlit apps
ipyearth -> An IPython Widget for Earth Maps
geopandas-view -> Interactive exploration of GeoPandas GeoDataFrames
geogif -> Turn xarray timestacks into GIFs

Python cluster computing with Dask

Get started by reading Democratizing Satellite Imagery Analysis with Dask
Dask works with your favorite PyData libraries to provide performance at scale for the tools you love -> checkout Read and manipulate tiled GeoTIFF datasets and accelerating-science-dask
Coiled is a managed Dask service.
Dask with PyTorch for large scale image analysis
stackstac -> Turn a STAC catalog into a dask-based xarray
dask-geopandas -> Parallel GeoPandas with Dask
dask-image -> many SciPy ndimage functions implemented

Algorithms in python

WaterDetect -> an end-to-end algorithm to generate open water cover mask, specially conceived for L2A Sentinel 2 imagery. It can also be used for Landsat 8 images and for other multispectral clustering/segmentation tasks.
GatorSense Hyperspectral Image Analysis Toolkit -> This repo contains algorithms for Anomaly Detectors, Classifiers, Dimensionality Reduction, Endmember Extraction, Signature Detectors, Spectral Indices
arosics -> Perform automatic subpixel co-registration of two satellite image datasets based on an image matching approach
detectree -> Tree detection from aerial imagery
pylandstats -> compute landscape metrics

Tools for image annotation

If you are performing object detection you will need to annotate images with bounding boxes. Check that your annotation tool of choice supports large image (likely geotiff) files, as not all will. Note that GeoJSON is widely used by remote sensing researchers but this annotation format is not commonly supported in general computer vision frameworks, and in practice you may have to convert the annotation format to use the data with your chosen framework. There are both closed and open source tools for creating and converting annotation formats.

A long list of tools is here
GroundWork is designed for annotating and labeling geospatial data like satellite imagery, from Azavea
Labelme Image Annotation for Geotiffs -> uses Labelme
Label Maker -> downloads OpenStreetMap QA Tile information and satellite imagery tiles and saves them as an .npz file for use in machine learning training.
CVAT is worth investigating, and have an open issue to support large TIFF files. This article on Roboflow gives a good intro to CVAT.
Deep Block is a general purpose AI platform that includes a tool for COCOJSON export for aerial imagery. Checkout this video
AWS supports image annotation via the Rekognition Custom Labels console
Roboflow can be used to convert between annotation formats
Other annotation tools include supervise.ly (web UI), rectlabel (OSX desktop app) and VoTT
Label Studio is a multi-type data labeling and annotation tool with standardized output format, webpage at labelstud.io
Deeplabel is a cross-platform tool for annotating images with labelled bounding boxes. Deeplabel also supports running inference using state-of-the-art object detection models like Faster-RCNN and YOLOv4. With support out-of-the-box for CUDA, you can quickly label an entire dataset using an existing model.
Alturos.ImageAnnotation is a collaborative tool for labeling image data on S3 for yolo
rectlabel is a desktop app for MacOS to label images for bounding box object detection and segmentation
pigeonXT can be used to create custom image classification annotators within Jupyter notebooks
ipyannotations -> Image annotations in python using jupyter notebooks
diffgram supports cloud backends
Label-Detect -> is a graphical image annotation tool and using this tool a user can also train and test large satellite images, fork of the popular labelImg tool

Movers and shakers on Github

Adam Van Etten is doing interesting things in object detection and segmentation
Andrew Cutts cohosts the Scene From Above podcast and has many interesting repos
Ankit Kariryaa published a recent nature paper on tree detection
Chris Holmes is doing great things at Planet
Christoph Rieke maintains a very popular imagery repo and has published his thesis on segmentation
Even Rouault maintains several of the most critical tools in this domain such as GDAL, please consider sponsoring him
Jake Shermeyer many interesting repos
Mort Canty is an expert in change detection
Nicholas Murray is an Australia-based scientist with a focus on delivering the science necessary to inform large scale environmental management and conservation
Qiusheng Wu is an Assistant Professor in the Department of Geography at the University of Tennessee
Robin Wilson is a former academic who is very active in the satellite imagery space

Companies on Github

For a full list of companies, on and off Github, checkout awesome-geospatial-companies. The following lists companies with interesting Github profiles.

Airbus Defence And Space
Azavea -> lots of interesting repos around STAC
Development Seed
Descartes Labs
Digital Globe
Mapbox -> thanks for Rasterio!
Planet Labs -> thanks for COGS!
up42 -> Airbus spinout providing 'The easiest way to build geospatial solutions'

Courses

Introduction to Geospatial Raster and Vector Data with Python -> an intro course on a single page
Manning: Monitoring Changes in Surface Water Using Satellite Image Data
Automating GIS processes includes a lesson on automating raster data processing
For deep learning checkout the fastai course which uses the fast.ai library & pytorch
pyimagesearch.com hosts courses and plenty of material using opencv and keras
Official opencv courses
TensorFlow Developer Professional Certificate

Books

Image Analysis, Classification and Change Detection in Remote Sensing With Algorithms for Python, Fourth Edition, By Morton John Canty -> code here
I highly recommend Deep Learning with Python by François Chollet

Online communities

Jobs

Pangeo discourse lists multiple jobs, global
Current Job Openings at Planet

Neural nets in space

Processing on satellite allows less data to be downlinked. E.g. super-resolution image might take 4-8 images to generate, then a single image is downlinked.

Lockheed Martin and USC to Launch Jetson-Based Nanosatellite for Scientific Research Into Orbit - Aug 2020 - One app that will run on the GPU-accelerated satellite is SuperRes, an AI-based application developed by Lockheed Martin, that can automatically enhance the quality of an image.
Intel to place movidius in orbit to filter images of clouds at source - Oct 2020 - Getting rid of these images before they’re even transmitted means that the satellite can actually realize a bandwidth savings of up to 30%
Whilst not involving neural nets the PyCubed project gets a mention here as it is putting python on space hardware such as the V-R3x

About the author

My background is in optical physics, and I hold a PhD from Cambridge on the topic of localised surface Plasmons. Since academia I have held a variety of roles, including doing research at Sharp Labs Europe, developing optical systems at Surrey Satellites (SSTL), and working at an IOT startup. It was whilst at SSTL that I started this repository as a personal resource. Over time I have steadily gravitated towards data analytics and software engineering with python, and I now work as a senior data scientist at Satellite Vu. Please feel free to connect with me on Twitter & LinkedIn, and please do let me know if this repository is useful to your work.