multimodal

There are 30 repositories under multimodal topic.

jina
jina-ai / jina
☁️ Build multimodal AI applications with cloud-native stack
neural-search cloud-native deep-learning machine-learning framework grpc kubernetes multimodal mlops pipeline fastapi generative-ai docker jaeger llmops opentelemetry cncf microservice orchestration prometheus
Language:Python 20151
microsoft / unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
beit beit-3 bitnet deepnet document-ai foundation-models kosmos kosmos-1 layoutlm layoutxlm llm minilm mllm multimodal nlp pre-trained-model textdiffuser trocr unilm xlm-e
Language:Python 18557
haotian-liu / LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
gpt-4 chatbot chatgpt llama multimodal llava foundation-models instruction-tuning multi-modality visual-language-learning llama-2 llama2 vision-language-model
Language:Python 16717
NVIDIA / NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
asr deeplearning generative-ai large-language-models machine-translation multimodal neural-networks speaker-diariazation speaker-recognition speech-synthesis speech-translation tts
Language:Python 10206
BentoML
bentoml / BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
model-serving mlops llmops generative-ai llm-inference model-inference-service inference-platform deep-learning llm-serving machine-learning python multimodal ml-engineering llm
Language:Python 6600
facebookresearch / mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
pytorch vqa pretrained-models multimodal deep-learning captioning dialog textvqa hateful-memes multi-tasking
Language:Python 5419
rerun
rerun-io / rerun
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
visualization computer-vision python robotics rust multimodal cpp
Language:Rust 5304
swyxio / ai-notes
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
ai gpt gpt-3 multimodal openai prompt-engineering stable-diffusion
Language:HTML 4678
SkalskiP / courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
computer-vision deep-learning deep-neural-networks machine-learning mlops multimodal transformers tutorial natural-language-processing nlp generative-model stable-diffusion
Language:Python 4581
big-AGI
enricoros / big-AGI
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
agi anthropic beam chatgpt chatgpt-ui generative-ai gpt gpt-4 gpt-5 groq large-language-models mistral multimodal openai openai-api stable-diffusion ui
Language:TypeScript 4409
kyegomez / tree-of-thoughts
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
artificial-intelligence chatgpt gpt4 multimodal prompt-engineering deep-learning prompt prompt-learning prompt-tuning
Language:Python 4067
Fengshenbang-LM
IDEA-CCNL / Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。
chinese-nlp pretrained-models pytorch distributed-training transformers aigc multimodal
Language:Python 3913
discoart
jina-ai / discoart
🪩 Create Disco Diffusion artworks in one line
creative-ai disco-diffusion cross-modal dalle generative-art multimodal diffusion prompts midjourney imgen discodiffusion creative-art clip-guided-diffusion latent-diffusion stable-diffusion
Language:Python 3836
luban-agi / Awesome-AIGC-Tutorials
Curated tutorials and resources for Large Language Models, AI Painting, and more.
aigc llm ai midjourney stable-diffusion deep-learning tutorials courses-resource prompt-engineering nlp awesome chatgpt multimodal
3474
rom1504 / img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
big-data dataset deep-learning download-images image image-dataset multimodal
Language:Python 3302
open-mmlab / mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
image-classification resnet mobilenet pytorch deep-learning swin-transformer beit clip constrastive-learning convnext mae masked-image-modeling moco pretrained-models self-supervised-learning vision-transformer multimodal
Language:Python 3198
OpenGVLab / InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
chatgpt foundation-model gpt gpt-4 gradio husky image-captioning langchain llm multimodal vqa internimage llama vicuna video-generation sam segment-anything click imagebind draggan
Language:Python 3138
microsoft / torchscale
Foundation Architecture for (M)LLMs
computer-vision machine-learning multimodal natural-language-processing pretrained-language-model speech-processing transformer translation
Language:Python 2934
NExT-GPT / NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
chatgpt foundation-models gpt-4 instruction-tuning large-language-models llm multi-modal-chatgpt multimodal visual-language-learning
Language:Python 2900
docarray
docarray / docarray
Represent, send, store and search multimodal data
docarray data-structures multimodal cross-modal neural-search deep-learning nested-data qdrant weaviate nearest-neighbor-search protobuf elasticsearch dataclass multi-modal semantic-search machine-learning pytorch fastapi pydantic
Language:Python 2779
Stability-AI / stability-sdk
SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
stable-diffusion ai-art generative-art latent-diffusion multimodal
Language:Jupyter Notebook 2404
OFA-Sys / OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
multimodal pretraining image-captioning text-to-image-synthesis visual-question-answering referring-expression-comprehension vision-language pretrained-models prompt prompt-tuning chinese
Language:Python 2337
rom1504 / clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
semantic-search deep-learning multimodal ai clip knn
Language:Jupyter Notebook 2163
X-PLUG / mPLUG-Owl
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
chatbot chatgpt large-language-models llama multimodal damo mplug instruction-tuning pretraining mplug-owl huggingface pytorch transformer alpaca visual-recognition gpt gpt4 gpt4-api dialogue video
Language:Python 1964
Awesome-Text-to-Image
Yutong-Zhou-cv / Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
generative-adversarial-network text-to-image image-synthesis image-generation survey awseome-list image-manipulation text-to-face multimodal multimodal-deep-learning
1900
X-PLUG / MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
agent gpt4v mllm mobile-agents multimodal multimodal-large-language-models multimodal-agent android app gui mobile automation copilot harmony ios
Language:Python 1876
alan-sdk-android
alan-ai / alan-sdk-android
Conversational AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)
alan-sdk android voice voice-assistant alan-voice alan-studio sdk alan-ai voice-commands voice-control conversational-ai speech-recognition text-to-speech machine-learning voice-interface vui multimodal
1855
alan-ai / alan-sdk-flutter
Conversational AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)
alan-sdk alan-studio chatbot voice voice-assistant voice-ai alan-voice flutter sdk voice-commands voice-control conversational-ai speech-recognition text-to-speech machine-learning voice-interface vui multimodal
Language:Ruby 1816
InternLM / InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
chatgpt visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model large-language-model large-vision-language-model llm vision-transformer gpt supervised-finetuning
Language:Python 1717
alan-ai / alan-sdk-ionic
Conversational AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)
alan-ionic-sdk alan-studio chatbot voice voice-assistant voice-ai ionic sdk voice-commands voice-control conversational-ai speech-recognition text-to-speech machine-learning voice-interface vui multimodal
Language:TypeScript 1712
autodistill
autodistill / autodistill
Images to inference with no labeling (use foundation models to train supervised models).
computer-vision model-distillation auto-labeling deep-learning foundation-models grounding-dino image-annotation image-classification instance-segmentation labeling-tool machine-learning multimodal object-detection pytorch segment-anything yolov5 yolov8
Language:Python 1557
invictus717 / MetaTransformer
Meta-Transformer for Unified Multimodal Learning
artificial-intelligence computer-vision machine-learning multimedia multimodal transformers foundationmodel
Language:Python 1445
modelscope / swift
ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs
agent awq deploy dpo finetune galore grok-1 lisa llama llama3 llava llm lora modelscope multimodal pre-training qwen qwen-110b sft unsloth
Language:Python 1442
open-mmlab / Multimodal-GPT
Multimodal-GPT
flamingo gpt gpt-4 llama multimodal transformer vision-and-language
Language:Python 1418
kyegomez / BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning
Language:Python 1347
Awesome-Multimodal-Research
Eurus-Holmes / Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
awesome multimodal multimodal-learning multimodal-research
Language:Python 1270