ruclion

followers

following

stars

Tsinghua University

深圳

https://blog.csdn.net/u013625492

户建坤's starred repositories

tensorflow

An Open Source Machine Learning Framework for Everyone

Language:C++Apache-2.0182411 7640 39113

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language:PythonMIT2818 39 181

py-webrtcvad

Python interface to the WebRTC Voice Activity Detector

Language:CNOASSERTION1872 48 80

cnpy

library to read/write .npy and .npz files in C/C++

Language:C++MIT1249 29 64

ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Language:Jupyter NotebookBSD-3-Clause1002 20 124

VNN

VNN是由欢聚集团(Joyy Inc.)推出的高性能、轻量级神经网络部署框架。目前已为Hago、VOO、VFly、马克相机等App提供20余种AI能力的支持，覆盖直播、短视频、视频编辑等泛娱乐场景和工程场景

Language:CNOASSERTION957 30 33

VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

Language:MATLAB814 45 40

cppflow

Run TensorFlow models in C++ without installation and without Bazel

Language:C++MIT760 25 187

Focal-Loss-Pytorch

全中文注释.(The loss function of retinanet based on pytorch).(You can use it on one-stage detection task or classifical task, to solve data imbalance influence).用于one-stage目标检测算法,提升检测效果.你也可以在分类任务中使用该损失函数,解决数据不平衡问题.

Language:Jupyter Notebook422 5 19

voicebook

🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

Language:PythonApache-2.0367 25 25

streamlit-audio-recorder

Record Audio from the User's Microphone in Apps that are Deployed to the Web. (via Browser Media-API, REACT-based, Streamlit Custom Component)

Language:TypeScriptMIT327 1 17

kaldiio

A pure python module for reading and writing kaldi ark files

Language:PythonNOASSERTION243 12 16

Speech-enhancement

Deep neural network based speech enhancement toolkit

Language:MATLABGPL-2.0209 8 28

dscore

Diarization scoring tools.

Language:PythonBSD-2-Clause194 8 4

GPV

Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper

Language:PythonGPL-3.0140 5 9

psla

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

Language:PythonBSD-3-Clause124 1 12

Automatic-Prosody-Annotation

Language:Python107 3 5

panns_transfer_to_gtzan

Language:Python92 2 11

Datadriven-GPVAD

The codebase for Data-driven general-purpose voice activity detection.

Language:PythonMIT89 8 15

AI_beatmap_generator

尝试使用神经网络生成音乐游戏Malody的谱面。

Language:Jupyter NotebookMIT43 2 1

ram_modified

"Recurrent Models of Visual Attention" in TensorFlow

Language:Python42 6 2

DIHARD_2019_baseline_alltracks

Language:Perl37 1 1

sound_event_detection

🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.

Language:PythonApache-2.035 1 1

lrfasd.github.io

Language:HTML35 6 10

DiViMe

ACLEW Diarization Virtual Machine

Language:ShellApache-2.030 13 152

mica-speech-activity-detection

Robust Speech Activity Detection (SAD) in movie audio

Language:Python25 21 6

DomainAdversarialVoiceActivityDetection

Code for reproducing experiments in "Domain-Adversarial Voice Activity Detection"

Language:Jupyter NotebookMIT23 4 2

audio_augment

A tool/script for batch speech data enhancement with speed/volume/RIRS/MUSAN

Language:Shell18 1 1

musan_investigation_cnn_rnn

Evaluation of the classification performance (Speech, Music, and Noise) of 1D (WaveNet) and 2D (MobileNet) CNN and RNN (GRU) on the MUSAN corpus.

Language:PythonMIT14 3 2

MultiTarget_VAD

Representation of Paper: On training targets for noise-robust voice activity detection.

Language:Jupyter NotebookMIT4 1 1