yixuanzhou / End-to-End-Audio-Recognition

Classify & Segment audio stream and convert into text

End-to-End-Audio-Recognition

The web has been deployed on 121.40.161.184, one can directly access on http://121.40.161.184:8484/music_voice.html(for audio to text translate) and http://121.40.161.184:8484/search.html(for search in database through keyword).

Dependencies

Python 2.x
HBase
For downloading m3u8 audio stream and convert to wav files:
- FFmpeg
For audio classification and segmentation tasks:
- PyAudioAnalysis
For segmenting an audio file (wav) into pieces:
- Pydub
For audio recognition and translate to text:
- Baidu speech recognition API

Workflow

Prepare audio files for training model (train-model.py)
Use pre-trained model to classify targeted audio segments (audio-classifier.py)
Filter to get optimized the audio segments (audio_filter.py)
Segment an audio file into pieces according to segment points (audio-segmenter.py)
For each audio segment, do audio to text translation (audio_recognition.py)
Save the result data in HBase (pythrift.py)

Demo

About

Classify & Segment audio stream and convert into text

Languages

Language:Python 66.6%Language:JavaScript 19.4%Language:HTML 14.0%