There are 3 repositories under llava topic.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利
ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 30+ MLLMs
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
A Framework of Small-scale Large Multimodal Models
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
llmcord.py • Talk to LLMs with your friends!
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Famous Vision Language Models and Their Architectures
From scratch implementation of a vision language model in pure PyTorch
basic google recaptcha solver using llava-v1.6-7b
🧘🏻♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
Docker image for LLaVA: Large Language and Vision Assistant