AI tools

This page contains a collection of opensource AI models and tools available for various use cases

Vision

Generative modeling

1. Stable Diffusion (2022 Aug 10)

Summary:
Resources
Projects
1. apple/ml-stable-diffusion - Port for Apple Silicon + CoreML
2. fast-stable-diffusion - fast-stable-diffusion, +25-50% speed increase + memory efficient + DreamBooth
3. Lsmith - StableDiffusionWebUI accelerated using TensorRT
4. ControlNet - copy compositions or human poses from a reference image
  - ControlNet v1.1 - A Complete Guide
5. imaginAIry - Github - AI imagined images. Pythonic generation of stable diffusion images.

2. Grounded-SAM ()

Summary: Marrying Grounding DINO with Segment Anything & Stable Diffusion & BLIP - Automatically Detect , Segment and Generate Anything with Image and Text Inputs
Resources:
- Grounded-Segment-Anything - Github
Projects
- Semantic-SAM - Segment and Recognize Anything at Any Granularity

3. AnimateDiff

Summary: Combine static images with motion dynamics
Resources:
- AnimateDiff - Project Site

4. PhotoMaker (2024 Jan)

Summary: Create photos/paintings/avatars of anyone in any style within seconds
Resources:

5. DragGAN (2023 May)

Image Inpainting

1. lama-cleaner (2022 Nov)

Summary: Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
Resources
- lama-cleaner - Github

Object detection

1. YOLOv8

Summary: YOLOv8 in PyTorch > ONNX > CoreML > TFLite. Can do detection, segmentation and much more.
Resources
- ultralytics - Github

2. Face Recognition

Summary: 2D and 3D Face alignment library build using pytorch
Resources
- 1adrianb/face-alignment - Github

Image Segmentation

1. SAM (2023 Apr 5) (License: Apache 2.0)

Summary: high quality object masks from input prompts such as points or boxes
Resources:
Projects
1. sam-hq - Segment Anything in High Quality
2. Fast-SAM - Fast Segment Anything
3. sam.cpp - Inference of Meta's Segment Anything Model in pure C/C++

2. Detic (2021 Jan)

Summary: A Detector with image classes that can use image-level labels to easily train detectors, detects any given class names
Resources:

Image embeddings

1. DINO

Summary: high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks
Resources:

Video

Object tracking

1. TrackHQ (2023 Jul)

Summary: Tracking Anything in High Quality
Resources
- HQTrack - Github
- Technical Report

Feature matching

1. LightGlue (2023 June 26)

Summary: a lightweight feature matcher with high accuracy and blazing fast inference
Resources:
- Paper: LightGlue: Local Feature Matching at Light Speed
- LightGlue - Github

Speech

Speech recognition

1. OpenAI Whisper (2022 Sept 21)

Summary: Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
Resources
- Introducing Whisper - Blog
- Robust Speech Recognition via Large-Scale Weak Supervision - Paper
  - arxiv
- whisper - Github (License: MIT)
Projects:
1. distil-whisper - 6x faster, 50% smaller, within 1% word error rate.
2. Talk to your multi-lingual AI assistant - Uses Whispher, GPT-3 and Coqui-TTS
3. Transcribe Youtube Video to text with OpenAI Whispher - YouTube - Using pytube and whispher
4. whisper.cpp - Port in C/C++, runs in CPU including mobile and rpi.
5. Whisper - High-performance GPGPU inference for Windows
6. whispherX - Timestamp-Accurate Automatic Speech Recognition using Force Alignment
7. faster-whispher - Faster Whisper transcription with CTranslate2
8. whispher-jax - optimised JAX code Whisper

Text

Text generation

1. BLOOM (2022 July)

Summary: BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. As such, it is able to output coherent text in 46 languages and 13 programming languages that is hardly distinguishable from text written by humans. BLOOM can also be instructed to perform text tasks it hasn't been explicitly trained for, by casting them as text generation tasks.
Resources
- Introducing The World’s Largest Open Multilingual Language Model: BLOOM - Blog
- BLOOM Model Card - Huggingface (License: Responsible AI License)
- tr11-176B-ml - Github
Projects
1. bloomz.cpp - C++ implementation for BLOOM Inference

2. GALACTICA (2022 Nov)

Summary: A general-purpose scientific language model. It is trained on a large corpus of scientific text and data. It can perform scientific NLP tasks at a high level, as well as tasks such as citation prediction, mathematical reasoning, molecular property prediction and protein annotation.
Resources
- Galactica online demo
- Galactica: A Large Language Model for Science - Paper
- galai - Github (License: Code - Apache 2.0, Model - CCA-NC4.0-PIL)

3. GPT-GJT (Dec 2022)

Summary: a variant forked off GPT-J (6B), and performs exceptionally well on text classification and other tasks
Resources
- GPT-JT-6B-v1 - HuggingFace
- Releasing v1 of GPT-JT powered by opensource AI - Blog

4. PubMed GPT 2.7B (2022 Dec)

Summary: A language model trained on biomedical literature which delivers an improved state of the art for medical question answering.

Resources

5. nanoGPT (2022 Dec)

Summary: The simplest, fastest repository for training/finetuning medium-sized GPTs
Resources
- nanoGPT - Github (License: MIT)

7. Petals (2022 Dec)

Summary: Run 100B+ language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Resources:
- https://petals.ml/
- https://github.com/bigscience-workshop/petals

8. Chat-RWKV (Jan 2023)

Summary: ChatRWKV is like ChatGPT but powered by the RWKV (100% RNN) language model, and open source.
Resources:

9. LLaMA (Feb 24, 2023)

Summary: Large Language Model Meta AI
Resources:
- Introducing LLaMA: A foundational, 65-billion-parameter large language model - Blog
- Paper]
Projects
1. open_llama - a permissively licensed open source reproduction
2. LLaMa - facebookresearch - Minimal project for inference
3. llama.cpp - Inference with C/C++
4. dalai - The simplest way to run LLaMA on your local machineml
5. llama-rs - Run LLaMA inference on CPU, with Rust
6. alpaca-lora - Instruct-tune LLaMA on consumer hardware
7. vicuna - an open-source chatbot trained by fine-tuning LLaMA
8. FastChat - Github
9. ChatDoctor
10. lit-LLaMA - Implementation of the LLaMA language model based on nanoGPT (Commercial Use)
11. Open-Llama - Train Llama model
12. open_llama - OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
13. llama2.c - Inference Llama 2 in one file of pure C
14. llama-dfdx - LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
15. llama2.mojo - Inference Llama 2 in one file of pure

11. Falcon

Summary: LLM for research and commercial purposes. Allows commercial use upto $1M revenue.
Resources:
- Falcon LLM - Home
- Huggingface models

12. FinGPT (2023 Jun)

Summary: Data-Centric FinGPT. Open-source for open finance!
Resources
- FinGPT - Github
- FinNLP - Website

13. Llama2

Summary: Open-source LLM free for research and commercial\
Resources
Projects
1. Llama2-Onnx - an optimized version of the Llama 2 model
2. llama-recipes - Examples and recipes for Llama 2 model

14. Mistral 7B

Summary: 7B model with Apache license, commercial use
Resources
- Announcing Mistral 7B - Blog post
- mistral-src - Inference code
- Huggingface - Model in hub

15. Gemma (2024 Feb 21)

Summary: Family of (4) SOTA LLMs (2B/7B x Base/Instruction) by Google
Resources
- Welcome Gemma - Google’s new open LLM - Announcement
- Models - HuggingFace
Projects
- gemma.cpp - lightweight, standalone C++ inference engine for Google's Gemma models

Embeddings

1. StarSpace (2017)

Summary: Learning embeddings for classification, retrieval and ranking.
Resources:
1. Paper
2. Github

2. Jina Embeddings-v2

Summary: Model that can support upto 8K context length
Resources
- HuggingFace
- Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

Image - Language

1. OpenCLIP ()

Summary: An open source implementation of CLIP.
Resources:
- Reproducible scaling laws for contrastive language-image learning - Paper
- https://github.com/mlfoundations/open_clip

2. IF ()

Summary: a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding
Resources:
- https://github.com/deep-floyd/IF
- Running IF with diffusers on a Free Tier Google Colab - Blog post

3. TinyGPT-V

Summary: Efficient Multimodal Large Language Model via Small Backbones. Requires a 24G GPU for training and an 8G GPU or CPU for inference.
Resources:
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones - Research paper
- TinyGPT-V - Github - Code

4. LLaVa (2023 Apr)

Summary:
Resources:
- LLaVA: Large Language and Vision Assistant - Project page
- Demo
- Research papers
  - Visual Instruction Tuning (NeurIPS 2023 Oral)
  - Improved Baselines with Visual Instruction Tuning (LLaVa 1.5)
- LLaVa - Github

5. moondream (2024 Jan)

Summary: a tiny (1.6B) vision language model that kicks ass and runs anywhere
Resources:
- moondream - Github
- moondream - Huggingface Spaces

6. Large World Model (2024 Feb)

Summary: 1M context length open model for long video and audio understanding
Resources:

Speech - Language

Text to Speech

1. Coqui-TTS

Summary: A deep learning toolkit for Text-to-Speech, battle-tested in research and production.
Resources

2. TorToiSe

Summary: A multi-voice TTS system trained with an emphasis on quality
Resources
- tortise - Github

3. AudioGPT

Summary: Understanding and Generating Speech, Music, Sound, and Talking Head
Resources:
- AudioGPT - Github

4. suno-ai/bark

Summary: Text-Prompted Generative Audio Model
Resources
- bark - Github

5. EmotiVoice

Summary: a powerful and modern open-source text-to-speech engine. EmotiVoice speaks both English and Chinese, and with over 2000 different voices. The most prominent feature is emotional synthesis, allowing you to create speech with a wide range of emotions, including happy, excited, sad, angry and others.
Resources
- EmotiVoice - Github

6. MeloTTS

Summary: High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Resources:
- MeloTTS - Github

Speech to Text

1. Coqui - STT

Summary: An open-source deep-learning toolkit for training and deploying speech-to-text models.
Resources:
- Documentation
- STT - Github

Tabular data

Transformers

1. Tab Transformers

Summary: Attention network for tabular data
Resources
- Paper - TabTransformer: Tabular Data Modeling Using Contextual Embeddings
- lucidrains/tab-transfor-pytorch

3D rendering

NeRF

1. NVIDIA Instant-NGP

Summary: Instant neural graphics primitives: lightning fast NeRF and more
Resources
- instant-ngp (License: NVIDIA Custom License)
- Getting started with NVIDIA Instant NeRFs

2. Shap-E (2023, May 3)

Summary: Generate 3D objects conditioned on text or images
Resources
- shap-e - Github
- Shap-E: Generating Conditional 3D Implicit Functions - Paper

3. Neuralangelo (Jun 2023)

Summary:
Resources:
- Neuralangelo: High-Fidelity Neural Surface Reconstruction

AI Tools

Language

1. langchain

Summary: Building applications with LLMs through composability
Resources:
- langchain - Github
- Getting started with LangChain - Towards Datascience
Projects
1. langflow - LangFlow is a UI for LangChain
2. flowise - Drag & drop UI to build your customized LLM flow using LangchainJS
3. awesome-langchain - Awesome list of tools and projects with the awesome LangChain framework

2. xturing

Summary: Build and control your own LLMs
Resources:
- xturing - Github

3. LocalAI

Summary: Self-hosted, community-driven simple local OpenAI-compatible API written in go
Resources:
- LocalAI - Github

4. Lamini

Summary: The LLM engine for rapidly customizing models. Allows commercial use!
Resources:
- Introducing Lamini - Blog
- lamini - Github

5. CodeTF

Summary: One-stop Transformer Library for State-of-the-art Code LLM
Resources
- CodeTF

6. MLC-LLM (2023 Mar)

Summary: Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
Resources

7. GPT4All

Summary: Open-source large language models that run locally on your CPU and nearly any GPU
Resources:
- Technical Report
- gpt4all - Github

8. OpenChatKit (2023 Mar 10)

Summary: OpenChatKit provides a powerful, open-source base to create both specialized and general purpose models for various applications. The kit includes an instruction-tuned language models, a moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories
Resources:
- Announcing OpenChatKit
- OpenChatKit - Github

9. FreedomGPT

Summary: A React and Electron-based app that executes the FreedomGPT LLM locally (offline and private) on Mac and Windows using a chat-based interface (based on Alpaca Lora)
Resources
- https://github.com/ohmplatform/FreedomGPT

10. Open-Assistant

Summary: Open Assistant is a project meant to give everyone access to a great chat based large language model.
Resources:
- https://projects.laion.ai/Open-Assistant/
- https://github.com/LAION-AI/Open-Assistant

11. SuperAGI

Summary: A dev-first open source autonomous AI agent framework. Enabling developers to build, manage & run useful autonomous agents quickly and reliably.
Resources
- SuperAGI - Github

12. exllamav2

Summary: A fast inference library for running LLMs locally on modern consumer-class GPUs
Resources
- exllamav2 - Github

13. QAnything

Summary: a local knowledge base question-answering system designed to support a wide range of file formats and databases, allowing for offline installation and use
Resources:
- QAnything - Github

14. llmware

Summary: Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.
Resources:
- llmware - Github

Vision

1. PixelLib

Summary: a library for performing segmentation of objects in images and videos
Resources

2. StreamDiffusion

Summary: A Pipeline-Level Solution for Real-Time Interactive Generation
Resources
- StreamDiffusion - Github

3. Supervision

Summary: We write your reusable computer vision tools.
Resources:
- supervision - Github

4. InvokeAI

Summary: InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media
Resources:
- InvokeAI - Github

Video

1. Roop (2023 Jun)

Summary: one-click face swap
Resources:
- roop - Github

2. ShortGPT (2023 Jul)

Summary: ShortGPT is a powerful framework for automating content creation. It simplifies video creation, footage sourcing, voiceover synthesis, and editing tasks.
Resources:
- ShortGPT - Github

Audio

1. JARVIS

Summary: a voice assistant made as an experiment using neural networks with Rust
Resources:
- jarvis - Github

Multi-modal

1. TaskMatrix

Summary - TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.
Resources
- Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models - Paper
- TaskMatrix - Github

2. Transformer Agents

Summary: Multi modal AI agent
Resources
- transformer-agents - HuggingFace

3. LibreChat

Summary: Enhanced ChatGPT Clone: Features OpenAI, GPT-4 Vision, Bing, Anthropic, OpenRouter, Google Gemini, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. More features in development
Resources:
- LibreChat - Github
- Documentation

4. open-interpreter

Summary: A natural language interface for computers
Resources:
- open-interpreter - Github

Dataset management

1. fiftyone

Summary: The open-source tool for building high-quality datasets and computer vision models
Resources:
- fiftyone - Github
- docs

AI Libraries

General

ColossalAI - Making large AI models cheaper, faster and more accessible

Vision

monai - medical imaging with deep learning
supervision - We write your reusable computer vision tools

Audio

SpeechBrain - An Open-Source Conversational AI Toolkit

Language

OpenNMT - An open source neural machine translation system
outlines - Neuro Symbolic Text Generation
llm-foundry - LLM training code for MosaicML foundation models
chainlit - Build Python LLM apps in minutes!
languagemodels - Explore large language models on any computer with 512MB of RAM
lit-gpt - Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Multi-modal

rasa - Open source machine learning framework to automate text- and voice-based conversations
1. RasaGPT - headless LLM chatbot platform

Miscellaneous

Model Zoo

modelzoo.co - Discover open source deep learning code and pretrained models.
OpenVINO Model Zoo - Model zoo from multiple sources
replicate - easy to use setup for popular models
modelscope - bring the notion of Model-as-a-Service to life
https://civitai.com/
open-llms - A list of open LLMs available for commercial use.

AI in the wild

AI Product Index - A curated index to track AI-powered products.
awesome-generative-ai - A curated list of modern Generative Artificial Intelligence projects and services
LinkedIn Post - Commercial use LLMs - List of commercially usable LLMs
ai-collection - A Collection of Awesome Generative AI Applications
tuning-playbook - A playbook for systematically maximizing the performance of deep learning models.
ollama - Get up and running with large language models, locally.
inference - Replace OpenAI GPT with another LLM in your app by changing a single line of code
llama-embeddings-fastapi-service - designed to facilitate and optimize the process of obtaining text embeddings using different LLMs