Qingsong Liu's repositories
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
bark
🔊 Text-Prompted Generative Audio Model
catvision
A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the performance of the open-source model Qwen-VL-7B-Chat.
ChatTTS
ChatTTS is a generative speech model for daily dialogue.
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
E2STR
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
FiT
FiT: Flexible Vision Transformer for Diffusion Model
generative-models
Generative Models by Stability AI
genmusic_demo_list
a list of demo websites for automatic music generation research
LLaMA2-Accessory
An Open-source Toolkit for LLM Development
LLM-groundedDiffusion
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)
MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Open-AnimateAnyone
Unofficial Implementation of Animate Anyone
PhotoMaker
PhotoMaker
Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis