Echo0125

followers

following

stars

KwaiVGI

NanJing

https://echo0125.github.io/

Echo~'s starred repositories

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonApache-2.01058900

mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Language:PythonMIT53900

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！

Language:PythonApache-2.0225000

free-books

互联网上的免费书籍

DIVA

Diffusion Feedback Helps CLIP See Better

Language:PythonMIT16700

Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Language:HTML24300

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION969100

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonApache-2.0157500

MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Language:PythonNOASSERTION114700

MMInstruct

Language:PythonApache-2.02400

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonNOASSERTION20100

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.0187600

LivePortrait

Bring portraits to life!

Language:PythonNOASSERTION1009300

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Language:PythonApache-2.037000

AntGPT

Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

Language:PythonMIT1700

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonApache-2.0150700

chatgpt-web-midjourney-proxy

One UI is all done with chatgpt web, midjourney, gpts,suno-v3,luma,runway; Simultaneous support Web / PWA / Linux / Win / MacOS platform

Language:JavaScriptMIT455300

bsq-vit

[BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization

Language:PythonMIT7000

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Language:Python77900

OmniTokenizer

OmniTokenizer: one model and one weight for image-video joint tokenization.

Language:PythonMIT21100

VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Language:PythonCC-BY-4.017500

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT113800

books

【编程随想】收藏的电子书清单（多个学科，含下载链接）

CC0-1.01800900

OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Language:PythonApache-2.012600

Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Language:Shell690700

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.0176900

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02487400

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonApache-2.0420800

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

RepKPU

Point Cloud Upsampling with Kernel Point Representation and Deformation

Language:PythonMIT1000