Yuezeyi's starred repositories
SenseVoice
Multilingual Voice Understanding Model
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
scaling_on_scales
When do we not need larger vision models?
animate-your-word
Official implementations for paper: Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
MultiBooth
[arXiv 2024] MultiBooth: This repo is the official implementation of "MultiBooth: Towards Generating All Your Concepts in an Image from Text"
VL-InterpreT
Visual Language Transformer Interpreter - An interactive visualization tool for interpreting vision-language transformers