There are 71 repositories under multimodal topic.
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Janus-Series: Unified Multimodal Understanding and Generation Models
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
Mobile-Agent: The Powerful GUI Agent Family
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
A visual playground for agentic workflows: Iterate over your agents 10x faster
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
Open-source framework for building AI-powered apps in JavaScript, Go, and Python, built and used in production by Google
Distributed query engine providing simple and reliable data processing for any modality and scale
Align Anything: Training All-modality Model with Feedback
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
Curated tutorials and resources for Large Language Models, AI Painting, and more.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
OpenMMLab Pre-training Toolbox and Benchmark
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
The most accurate document search and store for building AI apps
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Foundation Architecture for (M)LLMs