Yihao Feng's starred repositories
self-operating-computer
A framework to enable multimodal models to operate a computer.
DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
VideoCrafter
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
consistencydecoder
Consistency Distilled Diff VAE
GPT-4V-Act
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
EvaluationPapers4ChatGPT
Resource, Evaluation and Detection Papers for ChatGPT
Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
GPT-V-on-Web
👀🧠 GPT-4 Vision x 💪⌨️ Vimium = Autonomous Web Agent
MM-Navigator
LMMs as Smartphone Agents