Echo~ (Echo0125)

Echo0125

Geek Repo

Company:KwaiVGI

Location:NanJing

Home Page:https://echo0125.github.io/

Github PK Tool:Github PK Tool

Echo~'s starred repositories

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:10589Issues:0Issues:0

mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Language:PythonLicense:MITStargazers:539Issues:0Issues:0

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Language:PythonLicense:Apache-2.0Stargazers:2250Issues:0Issues:0

free-books

互联网上的免费书籍

Stargazers:14772Issues:0Issues:0

DIVA

Diffusion Feedback Helps CLIP See Better

Language:PythonLicense:MITStargazers:167Issues:0Issues:0

Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Language:HTMLStargazers:243Issues:0Issues:0

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:9691Issues:0Issues:0

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonLicense:Apache-2.0Stargazers:1575Issues:0Issues:0

MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Language:PythonLicense:NOASSERTIONStargazers:1147Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:24Issues:0Issues:0

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonLicense:NOASSERTIONStargazers:201Issues:0Issues:0

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonLicense:Apache-2.0Stargazers:1876Issues:0Issues:0

LivePortrait

Bring portraits to life!

Language:PythonLicense:NOASSERTIONStargazers:10093Issues:0Issues:0

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Language:PythonLicense:Apache-2.0Stargazers:370Issues:0Issues:0

AntGPT

Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

Language:PythonLicense:MITStargazers:17Issues:0Issues:0

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonLicense:Apache-2.0Stargazers:1507Issues:0Issues:0

chatgpt-web-midjourney-proxy

One UI is all done with chatgpt web, midjourney, gpts,suno-v3,luma,runway; Simultaneous support Web / PWA / Linux / Win / MacOS platform

Language:JavaScriptLicense:MITStargazers:4553Issues:0Issues:0

bsq-vit

[BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization

Language:PythonLicense:MITStargazers:70Issues:0Issues:0

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Language:PythonStargazers:779Issues:0Issues:0

OmniTokenizer

OmniTokenizer: one model and one weight for image-video joint tokenization.

Language:PythonLicense:MITStargazers:211Issues:0Issues:0

VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Language:PythonLicense:CC-BY-4.0Stargazers:175Issues:0Issues:0

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonLicense:MITStargazers:1138Issues:0Issues:0

books

【编程随想】收藏的电子书清单(多个学科,含下载链接)

License:CC0-1.0Stargazers:18009Issues:0Issues:0

OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Language:PythonLicense:Apache-2.0Stargazers:126Issues:0Issues:0

Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

Language:ShellStargazers:6907Issues:0Issues:0

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonLicense:Apache-2.0Stargazers:1769Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:24874Issues:0Issues:0

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonLicense:Apache-2.0Stargazers:4208Issues:0Issues:0

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Stargazers:341Issues:0Issues:0

RepKPU

Point Cloud Upsampling with Kernel Point Representation and Deformation

Language:PythonLicense:MITStargazers:10Issues:0Issues:0