DevOpsKev / model-training-and-deployment

Training and deployment of deep learning models for satellite & aerial imagery

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training and deployment of deep learning models applied to satellite and aerial imagery.

Model training best practice

Processing large images

Models are typically trained and inferenced on relatively small images. To inference on a large image it is necessary to use a sliding window over the image, inference on each window, then combining the results. However lower confidence predicitons will be made at the edges of the window where objects may be partially cropped. In segmentation it is typical to crop the edge regions of the prediction, and stitch together predictions into a mosaic. For object detection a framework called sahi has been developed, which intelligently merges bounding box predictions.

Metrics

A number of metrics are common to all model types (but can have slightly different meanings in contexts such as object detection), whilst other metrics are very specific to particular classes of model. The correct choice of metric is particularly critical for imbalanced dataset problems, e.g. object detection

  • TP = true positive, FP = false positive, TN = true negative, FN = false negative
  • Precision is the % of correct positive predictions, calculated as precision = TP/(TP+FP)
  • Recall or true positive rate (TPR), is the % of true positives captured by the model, calculated as recall = TP/(TP+FN). Note that FN is not possible in object detection, so recall is not appropriate.
  • The F1 score (also called the F-score or the F-measure) is the harmonic mean of precision and recall, calculated as F1 = 2*(precision * recall)/(precision + recall). It conveys the balance between the precision and the recall. Ref
  • The false positive rate (FPR), calculated as FPR = FP/(FP+TN) is often plotted against recall/TPR in an ROC curve which shows how the TPR/FPR tradeoff varies with classification threshold. Lowering the classification threshold returns more true positives, but also more false positives. Note that since FN is not possible in object detection, ROC curves are not appropriate.
  • Precision-vs-recall curves visualise the tradeoff between making false positives and false negatives
  • Accuracy is the most commonly used metric in 'real life' but can be a highly misleading metric for imbalanced data sets.
  • IoU is an object detection specific metric, being the average intersect over union of prediction and ground truth bounding boxes for a given confidence threshold
  • mAP@0.5 is another object detection specific metric, being the mean value of the average precision for each class. @0.5 sets a threshold for how much of the predicted bounding box overlaps the ground truth bounding box, i.e. "minimum 50% overlap"
  • For more comprehensive definitions checkout Object-Detection-Metrics
  • Metrics to Evaluate your Semantic Segmentation Model

Cloud GPUs

A GPU is required for training deep learning models (but not necessarily for inferencing), and this section lists a couple of free Jupyter environments with GPU available. There is a good overview of online Jupyter development environments on the fastai site. For personal projects I have historically used Google Colab with data hosted on Google Drive. The landscape for GPU providers is constantly changing. I currently recommend lightning.ai or AWS

Google Colab

  • Collaboratory notebooks with GPU as a backend for free for 12 hours at a time. Note that the GPU may be shared with other users, so if you aren't getting good performance try reloading.
  • Also a pro tier for $10 a month -> https://colab.research.google.com/signup
  • Tensorflow, pytorch & fastai available but you may need to update them
  • Colab Alive is a chrome extension that keeps Colab notebooks alive.
  • colab-ssh -> lets you ssh to a colab instance like it’s an EC2 machine and install packages that require full linux functionality

Kaggle

  • Free to use
  • GPU Kernels - may run for 1 hour
  • Tensorflow, pytorch & fastai available but you may need to update them
  • Advantage that many datasets are already available

Model deployment

This section discusses how to get a trained machine learning & specifically deep learning model into production. For an overview on serving deep learning models checkout Practical-Deep-Learning-on-the-Cloud. There are many options if you are happy to dedicate a server, although you may want a GPU for batch processing. For serverless use AWS lambda.

Rest API on dedicated server

A common approach to serving up deep learning model inference code is to wrap it in a rest API. The API can be implemented in python (flask or FastAPI), and hosted on a dedicated server e.g. EC2 instance. Note that making this a scalable solution will require significant experience.

Model serving with GRPC

GPRC is a framework for implementing Remote Procedure Call (RPC) via HTTP/2. Developed and maintained mainly by Google, it is widely used in the industry. It allows two machines to communicate, similar to HTTP but with better syntax and performance.

Framework specific model serving

If you are happy to live with some lock-in, these are good options:

Framework agnostic model serving

Using lambda functions - i.e. serverless

Using lambda functions allows inference without having to configure or manage the underlying infrastructure

Models in the browser

The model is run in the browser itself on live images, ensuring processing is always with the latest model available and removing the requirement for dedicated server side inferencing

Model optimisation for deployment

The general approaches are outlined in this article from NVIDIA which discusses fine tuning a model pre-trained on synthetic data (Rareplanes) with 10% real data, then pruning the model to reduce its size, before quantizing the model to improve inference speed. There are also toolkits for optimisation, in particular ONNX which is framework agnostic.

MLOps

MLOps is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently.

Model monitoring

Once your model is deployed you will want to monitor for data errors, broken pipelines, and model performance degradation/drift ref

Model tracking, versioning, specification & compilation

  • dvc -> a git extension to keep track of changes in data, source code, and ML models together
  • Weights and Biases -> keep track of your ML projects. Log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues
  • geo-ml-model-catalog -> provides a common metadata definition for ML models that operate on geospatial data
  • hummingbird -> a library for compiling trained traditional ML models into tensor computations, e.g. scikit learn model to pytorch for fast inference on a GPU
  • deepchecks -> Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort
  • pachyderm -> Data Versioning and Pipelines for MLOps. Read Pachyderm + Label Studio which discusses versioning and lineage of data annotations

AWS

Google Cloud

  • For storage use Cloud Storage (AWS S3 equivalent)
  • For data warehousing use BigQuery (AWS Redshift equivalent). Visualize massive spatial datasets directly in BigQuery using CARTO
  • For model training use Vertex (AWS Sagemaker equivalent)
  • For containerised apps use Cloud Run (AWS App Runner equivalent but can scale to zero)

Microsoft Azure

State of the art engineering

  • Compute and data storage are on the cloud. Read how Planet and Airbus use the cloud
  • Traditional data formats aren't designed for processing on the cloud, so new standards are evolving such as COG and STAC
  • Google Earth Engine and Microsoft Planetary Computer are democratising access to 'planetary scale' compute
  • Google Colab and others are providing free acces to GPU compute to enable training deep learning models
  • No-code platforms and auto-ml are making ML techniques more accessible than ever
  • Serverless compute (e.g. AWS Lambda) mean that managing servers may become a thing of the past
  • Custom hardware is being developed for rapid training and inferencing with deep learning models, both in the datacenter and at the edge
  • Supervised ML methods typically require large annotated datasets, but approaches such as self-supervised and active learning require less or even no annotation
  • Computer vision traditionally delivered high performance image processing on a CPU by using compiled languages like C++, as used by OpenCV for example. The advent of GPUs are changing the paradigm, with alternatives optimised for GPU being created, such as Kornia
  • Whilst the combo of python and keras/tensorflow/pytorch are currently preeminent, new python libraries such as Jax and alternative languages such as Julia are showing serious promise

Web apps

Flask is often used to serve up a simple web app that can expose a ML model

Neural nets in space

Processing on board a satellite allows less data to be downlinked. e.g. super-resolution image might take 8 images to generate, then a single image is downlinked. Other applications include cloud detection and collision avoidance.

About

Training and deployment of deep learning models for satellite & aerial imagery

License:Apache License 2.0