LHBuilder / SA-Segment-Anything

Vision-oriented multimodal AI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SA2: Vision-Oriented MultiModal AI

SA2 integrates SOTA (state-of-the-art) models and provides a vision-oriented multi-modal framework. It's not an LLM (large-language model), but comprises multiple large-scale models, some of which are built on top of cutting-edge foundation models.

Purposes

The surging momentum of generative AI (GAI) heralds the dawn of a new era in Artificial General Intelligence (AGI). LLMs and CV multi-modal large-scale models are two dominant trends in the GAI age. ChatGPT and GPT-4 set a ceiling bar for LLMs, but CV multi-modal large-scale models are still emerging.

Seeking AI is an AI company focusing on AI for Industry. We have built a solid foundation for AI innovation and standardized data development over the past 5 years. We roll out SA2 to help the community of CV multi-modal large-scale models. This SA2 project has the following purposes:

  1. Provide a unified multi-modal framework for different applications based on multi-modal foundation models.
  2. Integrate the SOTA vision models to build up a complete multi-modal platform by leveraging the real SOTA parts of these models.
  3. Focus on vision-oriented AI to accelerate CV development compared with the status quo of LLMs.

Installation

The code requires python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

SA2 Unified MultiModel Framework (UMMF)

git clone git@github.com:LHBuilder/SA-Segment-Anything.git

Meta SAM

Install Segment Anything:

Please follow the instructions here to install Meta SAM.

Or

pip install segment_anything

The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.

pip install opencv-python pycocotools matplotlib onnxruntime onnx

YOLO-NAS

Please follow the instructions here to install YOLO-NAS.

Or

pip install super-gradients

About

Vision-oriented multimodal AI

License:Apache License 2.0


Languages

Language:Jupyter Notebook 87.8%Language:Python 10.9%Language:C++ 0.9%Language:Shell 0.2%Language:Cuda 0.1%Language:CMake 0.0%Language:TypeScript 0.0%Language:Makefile 0.0%Language:C 0.0%Language:JavaScript 0.0%Language:Dockerfile 0.0%Language:HTML 0.0%Language:SCSS 0.0%