zsc

followers

following

stars

Beijing

https://zsc.github.io/

Organizations

megvii-research

Shuchang Zhou's starred repositories

stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Language:PythonMIT143100

TS-Whisper

Language:Python1500

SenseVoice

Multilingual Voice Understanding Model

Language:PythonNOASSERTION171600

NKF-AEC

Acoustic Echo Cancellation with Nerual Kalman Filtering

Language:HTML20100

optimize-and-reduce

A Top-Down Approach for Image Vectorization

Language:Jupyter NotebookMIT500

mateo-demo

MAchine Translation Evaluation Online (MATEO)

Language:PythonGPL-3.01500

ChartFormer

ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs

Language:Python300

SketchVideo

[EG 2023] Sketch Video Synthesis

Language:Jupyter Notebook19500

Leaderboard

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Language:Python41800

image2svg-awesome

All about image tracing and vectorization—the conversion of a raster image (jpg/png) to a vector image (svg).

MIT16800

Resemblyzer

A python package to analyze and compare voices with deep learning

Language:PythonApache-2.0268900

whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Language:Jupyter NotebookBSD-2-Clause252900

Awesome-Speaker-Diarization

Some comprehensive papers about speaker diarization

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language:PythonMIT349700

Whisper-WebUI

A Web UI for easy subtitle using whisper model.

Language:PythonApache-2.083000

audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Language:PythonMIT177600

speech2text-server

Language:Python100

PyTorch-SVGRender

SVG Differentiable Rendering: Generating vector graphics using neural networks. Support: text-to-SVG, Image-to-SVG, SVG Editing.

Language:PythonMPL-2.09600

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION1059200

SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Language:PythonApache-2.038000

bark-voice-cloning-HuBERT-quantizer

The code for the bark-voicecloning model. Training and inference.

Language:PythonMIT61900

IMS-Toucan

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.

Language:PythonApache-2.0129400

English-to-IPA

Converts English text to IPA notation

Language:PythonMIT35500

python-pinyin

汉字转拼音(pypinyin)

Language:PythonMIT477600

BigCiDian

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

Language:Python25200

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonMIT881400

pinyin-to-ipa

Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.

Language:PythonMIT2500

WhisperLive

A nearly-live implementation of OpenAI's Whisper.

Language:PythonMIT160700

WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Language:Jupyter NotebookMIT362500

OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

Language:PythonApache-2.0183900