There are 41 repositories under speech-processing topic.
A PyTorch-based Speech Toolkit
Reading list for research topics in multimodal machine learning
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Foundation Architecture for (M)LLMs
WaveNet vocoder
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
SincNet is a neural architecture for efficiently processing raw audio samples.
Open source audio annotation tool for humans
AI powered speech denoising and enhancement
General Speech Restoration
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
A neural network for end-to-end speech denoising
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
:sound: spafe: Simplified Python Audio Features Extraction
This repository has implementation for "Neural Voice Cloning With Few Samples"
This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)