abhie7 / terrain-recognition-vision-transformer

A custom-built Vision Transformer (ViT) for Terrain Recognition.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep Learning for Terrain Recognition

Team Triumph (Team No. 8)
Tinkerthon 2.0 | 5th April 2024 | First Runner's Up

Table of Contents

  1. Introduction
  2. Problem Statement
  3. Objective
  4. Dependencies
  5. Dataset
  6. Model Architecture
  7. Training and Validation
  8. Evaluation Metrics
  9. Use Cases
  10. Future Work
  11. Team Members
  12. Requirements
  13. Installation
  14. Conclusion

1. Introduction

This project aims to develop a digital explorer capable of recognizing various terrains like mountains, forests, and deserts using advanced computer learning techniques. This technology utilizes deep learning to analyze images and categorize them into different terrain types, allowing for use cases in drone surveillance, military operations, and beyond.

2. Problem Statement

"Imagine a digital explorer that learns to identify different types of landscapes, like mountains, forests, or deserts, using advanced computer learning."

3. Objective

The main goal of this project is to develop and deploy a deep learning model capable of accurately recognizing different terrain types in images. The model will undergo training using a custom dataset, comprising manually collected images and those obtained through automated web scraping from diverse sources.

4. Dependencies

Technology Description
PyTorch A popular open-source machine learning library for Python, used for building and training the deep learning model.
OpenCV A library of programming functions mainly aimed at real-time computer vision. It is used for image processing tasks.
TorchVision A library of pre-trained models and datasets for computer vision, used for data augmentation and potentially for transfer learning.
TQDM A fast, extensible progress bar for loops in Python, used to display the progress of data processing and model training.
PIL Used for opening, manipulating, and saving many different image file formats.

5. Dataset

  • The dataset for this project is a custom collection, comprised of both manually scraped images and those obtained through automated web scraping from multiple sources such as Google Images, Yahoo Images, Bing Images, Pexels, and Unsplash.
  • Additionally, satellite images from EmbeddingStudio/merged_remote_landscapes_v1 are utilized for training the model.
  • This dataset encompasses a diverse range of terrains, including mountains, forests, deserts, and more.

5.1 Data Augmentation

To enhance the model's ability to generalize and improve its performance, several data augmentation techniques are applied to the dataset:

  • Blur: Applies a Gaussian blur to the image.
  • Five Crop: Crops the image into five parts, allowing the model to learn from different perspectives of the same scene.
  • Color Jitter: Randomly changes the brightness, contrast, and saturation of an image.
  • Resize: Resizes the cropped images to a fixed size, ensuring consistency in the input data for the model.

6. Model Architecture

  • The model architecture for this project is based on leveraging Vision Transformers (ViT), a novel approach to image classification that has shown significant promise in recent years.
  • Unlike traditional Convolutional Neural Networks (CNNs), Vision Transformers treat images as a sequence of patches and apply transformer models to these patches.
  • This allows the model to capture long-range dependencies between pixels, potentially leading to better performance in recognizing complex patterns in landscapes.

7. Training and Validation

  • The model will undergo training using the augmented dataset, with a portion of the data set aside for validation to monitor the model's performance during training.
  • The training process will involve fine-tuning the model's parameters to minimize the loss function, which quantifies the disparity between the model's predictions and the actual labels.

8. Evaluation Metrics

For this project, the primary evaluation metric will be accuracy. Accuracy is a straightforward measure of how often the model's predictions match the actual labels. It provides a clear indication of the model's performance in terms of correctly identifying the landscape type from the images.

9. Use Cases

Use Case Description
Drone Surveillance Enhancing the capabilities of drones by enabling them to autonomously identify and classify landscapes.
Military Assisting in military operations by providing real-time terrain recognition.
Navigation and Mapping Assisting in generating detailed maps for autonomous vehicles or drones by identifying and categorizing terrains.
Educational Tools Serving as an educational resource, our technology can benefit students studying geography, environmental science, or computer vision.
Disaster Response Assessing damage extent and pinpointing areas requiring urgent attention. This facilitates disaster response and recovery endeavors, ensuring efficient resource allocation.

10. Future Work

Future enhancements to the project could include:

  • Expanding the dataset with more diverse and representative images.
  • Exploring more advanced deep learning architectures for improved performance.
  • Integrating the model with real-time data streams for dynamic terrain recognition.
  • Incorporating Histogram Analysis for Color will enhance terrain identification, providing richer classification features.
  • Handling Time Series Data will help in monitoring terrain changes over time for trend analysis, aiding in environmental monitoring and disaster impact assessment.

11. Team Members

  • Jugal Gajjar (Team Lead)
  • Sanjana Nathani
  • Abhiraj Chaudhuri
  • Rohan Agarkar

12. Requirements

  • Python 3.x
  • PyTorch
  • OpenCV
  • TorchVision
  • TQDM
  • PIL (Python Imaging Library)

13. Installation

Clone the repository to your local machine:

git clone https://github.com/abhie7/terrain-recognition-vision-transformer.git

Install the required dependencies:

pip install -r requirements.txt

14. Conclusion

This project represents a significant step towards developing a digital explorer capable of autonomously recognizing and classifying terrains from images. By leveraging advanced computer vision techniques and a custom-built dataset, the model has promising potential to enhance our understanding and interaction with landscapes.

About

A custom-built Vision Transformer (ViT) for Terrain Recognition.

License:MIT License


Languages

Language:Python 100.0%