Qingsong Liu's repositories
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
catvision
A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the performance of the open-source model Qwen-VL-7B-Chat.
ChatTTS
ChatTTS is a generative speech model for daily dialogue.
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
docker_image_pusher
使用Github Action将国外的Docker镜像转存到阿里云私有仓库,供国内服务器使用,免费易用
E2STR
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
FiT
FiT: Flexible Vision Transformer for Diffusion Model
FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
generative-models
Generative Models by Stability AI
genmusic_demo_list
a list of demo websites for automatic music generation research
LLaMA2-Accessory
An Open-source Toolkit for LLM Development
LLM-groundedDiffusion
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)
MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Open-AnimateAnyone
Unofficial Implementation of Animate Anyone
PhotoMaker
PhotoMaker
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis