Alexanda's starred repositories

cs-self-learning

计算机自学指南

Language:HTMLLicense:MITStargazers:53232Issues:316Issues:176

kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Language:ShellLicense:NOASSERTIONStargazers:13967Issues:698Issues:1637

leedl-tutorial

《李宏毅深度学习教程》(李宏毅老师推荐👍),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:11297Issues:264Issues:80

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:11009Issues:200Issues:2161

pyvideotrans

Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音

Language:PythonLicense:GPL-3.0Stargazers:8269Issues:51Issues:439

PyQt-Fluent-Widgets

A fluent design widgets library based on C++ Qt/PyQt/PySide. Make Qt Great Again.

Language:PythonLicense:GPL-3.0Stargazers:5052Issues:34Issues:722

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

Language:PythonLicense:GPL-3.0Stargazers:3190Issues:30Issues:510

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonLicense:Apache-2.0Stargazers:2891Issues:48Issues:60

Parselmouth

Praat in Python, the Pythonic way

Language:C++License:GPL-3.0Stargazers:1028Issues:21Issues:72

Whisper-WebUI

A Web UI for easy subtitle using whisper model.

Language:PythonLicense:Apache-2.0Stargazers:817Issues:7Issues:92

Chenyme-AAVT

这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。

Language:PythonLicense:MITStargazers:786Issues:9Issues:41

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

RapidASR

商用级开源语音自动识别程序库,开箱即用,全平台支持,中英文混合识别。A Cross-platform implementation of ASR inference. It's based on ONNXRuntime and FunASR. We provide a set of easier APIs to call ASR models.

Language:C++License:MITStargazers:466Issues:17Issues:26

praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).

Language:PythonLicense:MITStargazers:301Issues:9Issues:37

StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Language:PythonLicense:MITStargazers:289Issues:26Issues:12

pympi

A python module for processing ELAN and Praat annotation files

Language:PythonLicense:MITStargazers:92Issues:16Issues:40

zh_recogn

将音频或视频中的中文语音识别并导出为srt字幕,基于魔塔社区Paraformer模型

Language:PythonLicense:GPL-3.0Stargazers:69Issues:1Issues:3

create_pictures

A Praat script for creation of pictures (waveform, spectrogram, pitch contour, aligned with a textgrid). It creates figures in PNG PDF wmf eps, PraatPic, of all the Sound and TextGrid files it finds in a folder. The pictures contain a waveform (optional), a spectrogram(optional), the F0 track optional and a the content of the tiers of the TextGrid associated with the sound file optional

Speech-and-Language-Processing-3rd-Edition-Solutions

Solutions for the book "Speech and Language Processing" (3rd ed. draft) by Dan Jurafsky and James H. Martin

Language:PythonStargazers:14Issues:1Issues:0

forcealign

ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.

Language:PythonLicense:MITStargazers:8Issues:0Issues:0

dsp_tutorials

I wanted guided tutorials on digital signal processing so I decided to create them. The result is this ebook: "Digital Signal Processing for Speech, Language, and Hearing Scientists"

Language:Jupyter NotebookStargazers:7Issues:2Issues:0

vlabeler-textgrid

A set of plugins of vLabeler for Praat TextGrid

Language:JavaScriptStargazers:7Issues:2Issues:1

HermeSpeechRecorder

Web application for speech recording

Language:JavaScriptLicense:MITStargazers:4Issues:0Issues:0

Anchor-annotator

Anchor annotator is a program for inspecting corpora for the Montreal Forced Aligner and correcting transcriptions and pronunciations

Language:PythonLicense:MITStargazers:3Issues:3Issues:0

Phoneme-Forced-Alignment

Comparison of methods to perform forced-alignment of phonemes in English

Language:RoffStargazers:2Issues:0Issues:0

jason2textgrid

Python script to convert WhisperX JSON time-stamps to Praat TextGrid files

Language:PythonLicense:GPL-3.0Stargazers:2Issues:1Issues:0

forced_alignment

Slovene speech alignment with Montreal Forced Aligner

Language:PythonStargazers:1Issues:0Issues:0

whisper-webmaus

Una serie de scripts para generar transcripciones usando Whisper y TextGrids usando WebMAUS a partir de grabaciones de audio

Language:PythonLicense:MITStargazers:1Issues:0Issues:0