AlexandaJerry

Alexanda's starred repositories

kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Language:ShellNOASSERTION13967 698 1637

leedl-tutorial

《李宏毅深度学习教程》（李宏毅老师推荐👍），PDF下载地址：https://github.com/datawhalechina/leedl-tutorial/releases

Language:Jupyter NotebookNOASSERTION11297 264 80

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.011009 200 2161

pyvideotrans

Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言，并添加配音

Language:PythonGPL-3.08269 51 439

PyQt-Fluent-Widgets

A fluent design widgets library based on C++ Qt/PyQt/PySide. Make Qt Great Again.

Language:PythonGPL-3.05052 34 722

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

Language:PythonGPL-3.03190 30 510

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonApache-2.02891 48 60

Parselmouth

Praat in Python, the Pythonic way

Language:C++GPL-3.01028 21 72

Whisper-WebUI

A Web UI for easy subtitle using whisper model.

Language:PythonApache-2.0817 7 92

Chenyme-AAVT

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

Language:PythonMIT786 9 41

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

MIT545 86 4

RapidASR

商用级开源语音自动识别程序库，开箱即用，全平台支持，中英文混合识别。A Cross-platform implementation of ASR inference. It's based on ONNXRuntime and FunASR. We provide a set of easier APIs to call ASR models.

Language:C++MIT466 17 26

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).

Language:PythonMIT301 9 37

StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Language:PythonMIT289 26 12

pympi

A python module for processing ELAN and Praat annotation files

Language:PythonMIT92 16 40

ReFlow-VAE-SVC

Language:PythonMIT78 8 2

zh_recogn

将音频或视频中的中文语音识别并导出为srt字幕，基于魔塔社区Paraformer模型

Language:PythonGPL-3.069 1 3

create_pictures

A Praat script for creation of pictures (waveform, spectrogram, pitch contour, aligned with a textgrid). It creates figures in PNG PDF wmf eps, PraatPic, of all the Sound and TextGrid files it finds in a folder. The pictures contain a waveform (optional), a spectrogram(optional), the F0 track optional and a the content of the tiers of the TextGrid associated with the sound file optional

Language:Praat20 4 1

Speech-and-Language-Processing-3rd-Edition-Solutions

Solutions for the book "Speech and Language Processing" (3rd ed. draft) by Dan Jurafsky and James H. Martin

Language:Python14 10

forcealign

ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.

Language:PythonMIT800

AlexandaJerry

Alexanda's starred repositories

cs-self-learning

kaldi

leedl-tutorial

NeMo

pyvideotrans

PyQt-Fluent-Widgets

X-AnyLabeling

parler-tts

Parselmouth

Whisper-WebUI

Chenyme-AAVT

ruozhiba

Meta-voicebox

RapidASR

praatIO

StableTTS

pympi

ReFlow-VAE-SVC

zh_recogn

create_pictures

Speech-and-Language-Processing-3rd-Edition-Solutions

forcealign

dsp_tutorials

vlabeler-textgrid

HermeSpeechRecorder

Anchor-annotator

Phoneme-Forced-Alignment

jason2textgrid

forced_alignment

whisper-webmaus