There are 30 repositories under multimodal topic.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Curated tutorials and resources for Large Language Models, AI Painting, and more.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
OpenMMLab Pre-training Toolbox and Benchmark
Foundation Architecture for (M)LLMs
SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
Easily compute clip embeddings and build a clip retrieval system with them
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Actionable AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Actionable AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)
Actionable AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Images to inference with no labeling (use foundation models to train supervised models).
Meta-Transformer for Unified Multimodal Learning
Multimodal-GPT
A curated list of Multimodal Related Research.
Actionable AI SDK for Apache Cordova to enable text and voice conversations with actions (iOS and Android)
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.