audio-sound-and-speech

repository of audio,sound and speech related paper ,tools and docs

Papers

https://github.com/google/uis-rnn This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization. https://arxiv.org/abs/1810.04719

https://github.com/philipperemy/deep-speaker

https://github.com/qqueing/DeepSpeaker-pytorch

https://arxiv.org/abs/1604.07160 Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection

SURREY-CVSSP SYSTEM FOR DCASE2017 CHALLENGE TASK4

https://arxiv.org/find/all/1/all:+dcase/0/1/0/all/0/1 https://wenku.baidu.com/view/b223255b3186bceb18e8bb71.html https://etymo.io/search/Dcase https://arxiv.org/abs/1612.01611v1 https://arxiv.org/abs/1607.03681v2 https://arxiv.org/abs/1703.06902v1 https://arxiv.org/abs/1609.06026v3 http://karol.piczak.com/papers/Piczak2015-ESC-ConvNet.pdf https://arxiv.org/pdf/1609.05234.pdf(https://github.com/spragunr/deep_q_rl)

https://vijaychan.github.io/Publications/2011%20-%20Survey%20and%20evaluation%20of%20audio%20fingerprinting%20schemes%20for%20mobile%20audio%20search.pdf SURVEY AND EVALUATION OF AUDIO FINGERPRINTING SCHEMES FOR MOBILE QUERY-BY-EXAMPLE APPLICATIONS

Tools and code

https://github.com/dake/openVP 声纹识别

https://github.com/tensorflow/models/tree/master/research/audioset CNN Architectures for Large-Scale Audio Classification

http://projects.csail.mit.edu/soundnet/ SoundNet: Learning Sound�Representations from Unlabeled Video

https://github.com/tyiannak/pyAudioAnalysis http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0144610 https://github.com/librosa/librosa https://github.com/readbeyond/aeneas https://github.com/CPJKU/madmom https://github.com/aalireza/SimpleAudioIndexer https://github.com/craffel/mir_eval

Audio/Sound event detection:

https://github.com/gorinars/dcase16-cnn https://github.com/liuhuang31/dcase17_cnn https://github.com/kahst/AcousticEventDetection https://github.com/nationalparkservice/acoustic_discovery

可视化：

https://github.com/TUT-ARG/sed_vis

https://github.com/TUT-ARG/TUT_Rare_sound_events_mixture_synthesizer

http://tut-arg.github.io/sed_eval/:评估工具

https://github.com/TUT-ARG/sed_vis ：可视工具

https://github.com/znichols/racKet https://github.com/justinsalamon/UrbanSound8K-JAMS http://bmcfee.github.io/papers/scipy2015_librosa.pdf

https://github.com/andabi/voice-vector A deep neural network for finding text-independent speaker embedding written in tensorflow and tensorpack

musical fingerprinting systems:

https://github.com/echonest/echoprint-server Server components for Echoprint https://github.com/beetbox/pyacoustid Python bindings for Chromaprint acoustic fingerprinting and the Acoustid Web service https://acoustid.org AcoustID is a project providing complete audio identification service, based entirely on open source software. https://labrosa.ee.columbia.edu/matlab/audfprint/ audfprint is a (compiled) Matlab script that can take a list of soundfiles and create a database of landmarks, and then subsequently take one or more query audio files and match them against the previously-created database.

https://github.com/dpwe/audfprint Landmark-based audio fingerprinting

https://github.com/spotify/echoprint-server Server for the Echoprint audio fingerprint system https://github.com/worldveil/dejavu Audio fingerprinting and recognition in Python https://github.com/jameslyons/python_speech_features This library provides common speech features for ASR including MFCCs and filterbank energies.

Documents

https://github.com/bootphon/phonemizer Simple text to phonemes converter for multiple languages

https://mp.weixin.qq.com/s?__biz=MzU2OTA0NzE2NA==&mid=2247501030&idx=1&sn=31fe4c7f596e377afc3a473bbffc84e2&chksm=fc8625f5cbf1ace38f098b1e3d641a66a9618926ffe4a3c11abf549ddf95aeb25b5f99e261bb&mpshare=1&scene=24&srcid=110566sXYXbPEX9X6HQqQvYV#rd 语音识别领域最全入门资料、论文、代码、产品大合集！包括语音识别，语音合成，声纹识别等内容，一文在手，带你走进语音识别的世界。

https://www.zhihu.com/question/53707809/answer/181292755 https://zhuanlan.zhihu.com/p/24362279 https://www.zhihu.com/question/21505605 https://www.zhihu.com/question/265075184/answer/291146573 https://zhuanlan.zhihu.com/p/26482011

MFCC情感识别：

http://blog.csdn.net/u011108244/article/details/51661186 https://my.oschina.net/jamesju/blog/193343 http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/ http://blog.csdn.net/audio_algorithm/article/details/78709422 https://wenku.baidu.com/view/39b761f20242a8956bece4a3.html

Detection and Classification of Acoustic Scenes and Events Outcome of the DCASE

https://www.cs.tut.fi/sgn/arg/dcase2017/ https://github.com/yongxuUSTC/dcase2017_task4_cvssp https://github.com/DeepLJH0001/DCASE2016 http://www.sohu.com/a/193907127_642762 https://www.cs.tut.fi/sgn/arg/dcase2017/challenge/download

https://www.zhihu.com/question/56816282/answer/150639596 https://github.com/qiuqiangkong/DCASE2016_Task3 https://www.zhihu.com/question/57658184/answer/245420536

http://www.sohu.com/a/117638110_465975 https://www.zhihu.com/question/23497307/answer/24772167

https://www.zhihu.com/question/20398418/answer/18080841 https://www.zhihu.com/question/24342192/answer/225984574 https://zhuanlan.zhihu.com/p/33464788 https://zhuanlan.zhihu.com/p/33144046 https://ccrma.stanford.edu/~jos/filters/ https://zhuanlan.zhihu.com/p/28848339

https://blog.csdn.net/yutianzuijin/article/details/21446401 音乐检索简介 https://max.book118.com/html/2017/0221/92851570.shtm 基于内容的音频信息检索 https://github.com/musescore/MuseScore MuseScore is an open source and free music notation software.

http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/ Audio Fingerprinting with Python and Numpy

https://www.zhihu.com/question/265066896/answer/291395259 https://www.zhihu.com/question/265209086/answer/301313983 语音识别方面的比赛有哪些？

https://blog.naaln.com/2013/08/music-algorithm-for-fingerprint-framework/ 音乐指纹 - 算法的框架

https://github.com/ybayle/awesome-deep-learning-music List of articles related to deep learning applied to music

greysun / audio-sound-and-speech

audio-sound-and-speech

Papers

Tools and code

Documents

About