Qingsong Liu's repositories
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
ChatTTS
ChatTTS is a generative speech model for daily dialogue.
CosyVoice
LLM based TTS model, providing inference/training/deployment full-stack ability.
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
docker_image_pusher
使用Github Action将国外的Docker镜像转存到阿里云私有仓库,供国内服务器使用,免费易用
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
LLM-groundedDiffusion
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)
Medical-SAM2
Medical SAM 2: Segment Medical Images As Video Via Segment Anything Model 2
mmaction2
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Open-AnimateAnyone
Unofficial Implementation of Animate Anyone
PhotoMaker
PhotoMaker
Qwen2-VL-Finetune
An open-source implementaion for fine-tuning Qwen2-VL-2B and Qwen2-VL-7B.
RecordRTC
RecordRTC is WebRTC JavaScript library for audio/video as well as screen activity recording. It supports Chrome, Firefox, Opera, Android, and Microsoft Edge. Platforms: Linux, Mac and Windows.
SenseVoice
Multilingual Voice Understanding Model
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
xtts-api-server
A simple FastAPI Server to run XTTSv2