songmeixu

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Language:PythonNOASSERTION1557 33 149

VulkanSamples

Vulkan Samples

Language:C++NOASSERTION1354 116 100

mycroft-precise

A lightweight, simple-to-use, RNN wake word listener

Language:PythonApache-2.0793 33 189

BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

Language:C++Apache-2.0745 35 230

inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

Language:PythonMIT695 23 69

Native_SDK

C++ cross-platform 3D graphics SDK. Includes demos & helper code (resource loading etc.) to speed up development of Vulkan, OpenGL ES 2.0 & 3.x applications

Language:C++MIT659 96 61

VulkanTools

Tools to aid in Vulkan development

Language:C++NOASSERTION633 45 552

optimizer

Actively maintained ONNX Optimizer

Language:C++Apache-2.0587 28 63

Vulkan-Loader

Vulkan Loader

Language:CNOASSERTION467 66 452

FastASR

这是一个用C++实现ASR推理的项目，它依赖很少，安装也很简单，推理速度很快，在树莓派4B等ARM平台也可以流畅的运行。支持的模型是由Google的Transformer模型中优化而来，数据集是开源wenetspeech(10000+小时)或阿里私有数据集(60000+小时)，所以识别效果也很好，可以媲美许多商用的ASR软件。

Language:CApache-2.0433 22 68

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Language:PythonMIT405 29 27

speech-denoiser

A speech denoise lv2 plugin based on RNNoise library

Language:CLGPL-3.0281 14 20

BigCiDian

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

Language:Python247 9 1

NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

Language:CApache-2.0233 9 27