multi-modal

There are 21 repositories under multi-modal topic.

OpenBMB / MiniCPM-V
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
minicpm minicpm-v multi-modal
Language:Python 8028
modelscope
modelscope / modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
cv deep-learning machine-learning multi-modal nlp python science speech
Language:Python 6537
THUDM / CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
cross-modality language-model multi-modal pretrained-models visual-language-models
Language:Python 5685
lucidrains / DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
artificial-intelligence attention-mechanism deep-learning multi-modal text-to-image transformers
Language:Python 5537
marqo
marqo-ai / marqo
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
chatgpt clip deep-learning gpt hacktoberfest hnsw information-retrieval knn large-language-models machine-learning machinelearning multi-modal natural-language-processing search-engine semantic-search tensor-search transformers vector-search vision-language visual-search
Language:Python 4353
valhalla / valhalla
Open Source Routing Engine for OpenStreetMap
astar dijkstra directions isochrones multi-modal openstreetmap routing routing-engine tiled traveling-salesman
Language:C++ 4336
OpenGVLab / InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
gpt gpt-4o gpt-4v image-classification image-text-retrieval llm mme multi-modal semantic-segmentation video-classification vision-language-model vit-22b vit-6b
Language:Python 4285
THUDM / VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
chatglm-6b gpt multi-modal
Language:Python 4046
OFA-Sys / Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language
Language:Python 4021
modelscope / agentscope
Start building LLM-empowered multi-agent applications in an easier way.
agent chatbot gpt-4 large-language-models llm llm-agent multi-agent distributed-agents multi-modal llama3 gpt-4o drag-and-drop
Language:Python 3239
zjunlp / DeepKE
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
attribute-extraction chinese deep-learning deepke document-level few-shot information-extraction instructie kg knowledge-graph knowprompt lightner low-resource multi-modal named-entity-recognition ner nlp prompt pytorch relation-extraction
Language:Python 3238
docarray
docarray / docarray
Represent, send, store and search multimodal data
cross-modal data-structures dataclass deep-learning docarray elasticsearch fastapi machine-learning multi-modal multimodal nearest-neighbor-search nested-data neural-search protobuf pydantic pytorch qdrant semantic-search weaviate
Language:Python 2863
PKU-YuanGroup / Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
instruction-tuning large-vision-language-model multi-modal
Language:Python 2718
SciSharp / LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
chatbot gpt llama llamacpp llm semantic-kernel llava multi-modal llama2 llama3 llama-cpp
Language:C# 2315
PKU-YuanGroup / MoE-LLaVA
Mixture-of-Experts for Large Vision-Language Models
large-vision-language-model mixture-of-experts moe multi-modal
Language:Python 1856
Kav-K / GPTDiscord
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
artificial-intelligence asyncio chatbot code-interpreter collaborate dalle2 digitalocean discord embeddings extractive-question-answering github gpt3 hacktoberfest help-wanted moderator-bot multi-modal openai openai-api pinecone python
Language:Python 1805
modelscope / data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！
chinese data-analysis data-science data-visualization dataset gpt gpt-4 instruction-tuning large-language-models llama llava llm llms multi-modal nlp opendata pre-training pytorch sora streamlit
Language:Python 1786
dvlab-research / LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
large-language-model llm multi-modal segmentation
Language:Python 1660
THUDM / CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
cogvlm language-model multi-modal pretrained-models
Language:Python 1617
hcaptcha-challenger
QIN2DIM / hcaptcha-challenger
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
yolov5 hcaptcha opencv-python onnx-models hcaptcha-solver solver onnx yolo onnxruntime playwright clip multi-modal zero-shot-classification multi-modal-learning computer-vision image-segmentation object-detection
Language:Python 1464
OpenMotionLab / MotionGPT
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
3d-generation chatgpt gpt language-model motion motion-generation motiongpt multi-modal text-driven text-to-motion
Language:Python 1385
DirtyHarryLYL / Transformer-in-Vision
Recent Transformer-based CV and related works.
computer-vision deep-learning multi-modal paper self-attention transformer vision-transformers visual-language
1307
tangxyw / RecSysPapers
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
calibration causal-inference cold-start contrastive-learning debias distillation diverse fairness match papers rank re-rank recommendation-system feedback-delay pre-rank look-alike multi-scenario multi-task reinforcement-learning multi-modal
Language:Python 1154
fastRAG
IntelLabs / fastRAG
Efficient Retrieval Augmentation and Generation Framework
benchmark colbert diffusion generative-ai information-retrieval knowledge-graph llm multi-modal nlp question-answering semantic-search sentence-transformers summarization transformers
Language:Python 1144
vercel / modelfusion
The TypeScript library for building AI applications.
chatbot gpt-3 javascript js llm openai ts typescript whisper ai embedding huggingface dall-e stable-diffusion llamacpp artificial-intelligence claude multi-modal ollama mistral
Language:TypeScript 1052
MedMNIST
MedMNIST / MedMNIST
[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification
dataset benchmark automl mnist medical medical-image-analysis medmnist multi-modal decathlon medical-imaging medical-image-computing deep-learning machine-learning 3d 2d classification few-shot-learning pytorch federated-learning
Language:Python 1016
bytedance / SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
audio audio-processing bytedance iclr2024 large-language-models multi-modal music speech speech-recognition tsinghua-university
Language:Python 907
open-compass / VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa
Language:Python 721
farmvibes-ai
microsoft / farmvibes-ai
FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
agriculture ai geospatial geospatial-analytics stac sustainability multi-modal remote-sensing weather
Language:Jupyter Notebook 657
jokieleung / awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
attention-networks awesome-list multi-modal multi-modal-learning vqa
650
PKU-YuanGroup / LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
language-central multi-modal pretraining zero-shot
Language:Python 629
salesforce / UniControl
Unified Controllable Visual Generation Model
aigc generation multi-modal
Language:Python 593
dvlab-research / LLMGA
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024
aigc image-design-assistant image-editing image-generation large-language-model llm mllm multi-modal
Language:Python 406
Tebmer / Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
data-augmentation instruction-following kd knowledge-distillation large-language-model llm self-training survey compression data-synthesis feedback multi-modal self-distillation alignment supervised-finetuning
403
v-iashin / SpecVQGAN
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
transformer vqvae gan pytorch audio-generation video-features melgan multi-modal video-understanding vggsound vas bmvc evaluation-metrics audio video
Language:Jupyter Notebook 334
zeta
kyegomez / zeta
Build high-performance AI models with modular building blocks
artificial-intelligence multi-modal transformers deep-learning gpt4 llama2 multi-agent-systems multi-modal-learning multi-platform pytorch speech-recognition transformer longnet
Language:Python 325