Shuchang Zhou's starred repositories
SenseVoice
Multilingual Voice Understanding Model
optimize-and-reduce
A Top-Down Approach for Image Vectorization
mateo-demo
MAchine Translation Evaluation Online (MATEO)
ChartFormer
ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs
SketchVideo
[EG 2023] Sketch Video Synthesis
Leaderboard
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
image2svg-awesome
All about image tracing and vectorization—the conversion of a raster image (jpg/png) to a vector image (svg).
Resemblyzer
A python package to analyze and compare voices with deep learning
whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Awesome-Speaker-Diarization
Some comprehensive papers about speaker diarization
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Whisper-WebUI
A Web UI for easy subtitle using whisper model.
audiomentations
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
PyTorch-SVGRender
SVG Differentiable Rendering: Generating vector graphics using neural networks. Support: text-to-SVG, Image-to-SVG, SVG Editing.
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
bark-voice-cloning-HuBERT-quantizer
The code for the bark-voicecloning model. Training and inference.
IMS-Toucan
Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
English-to-IPA
Converts English text to IPA notation
python-pinyin
汉字转拼音(pypinyin)
pinyin-to-ipa
Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.
WhisperLive
A nearly-live implementation of OpenAI's Whisper.
WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.