There are 0 repository under multimodal-llm topic.
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
Research Code for Multimodal-Cognition Team in Ant Group
[NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]
Official repository of the paper: Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics
[ACL 2024] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"
Streamlit app to chat with images using Multi-modal LLMs.
Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.
LLaVA base model for use with Autodistill.
[NAACL 2025 Findings] Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding
Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)
NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?
Brain Tumor Classification project leveraging neural networks to classify MRI scans with high accuracy. Features include a Streamlit-based app for predictions, Gemini 1.5 Flash for interpretability, and advanced visualizations. It also includes model comparison, multimodal LLM integration, and real-time interactions.