Qingsong Liu's starred repositories
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
VoiceStreamAI
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
selfservicekiosk-audio-streaming
A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.
Awesome-Speaker-Diarization
Some comprehensive papers about speaker diarization
Languagecodec
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Awesome-Chart-Understanding
A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.
YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection