Binaural-Source-Localization-CNN

Basic Information

Author: Gregory Hunkins

Organization: University of Rochester

License: MIT

Abstract: A Convolutional Neural Network (CNN) classification system was designed for the task of source localization of human voices in 3-D space. A new dataset, VoiceBin100K, is introduced to accomplish this task and for future work in the field. The CNN inputs variable-length binaurual short- time Fourier Transform (STFT) magnitude and phase features and predicts location of the speaker’s voice according to 168 location classes.

Running The Code

Reference: https://cs.rochester.edu/~cxu22/t/577F17/bluehive_tutorial.html

Data

Please contact ghunkins@u.rochester.edu for access to the data. A public link will available shortly.

About

A Deep Convolutional Neural Network (DCNN) designed for the task of localizing human speech to 168 location classes using binaural microphone inputs.

MIT License

Languages

Language:Python 98.1%Language:Shell 1.9%