Author: Gregory Hunkins
Organization: University of Rochester
License: MIT
Abstract: A Convolutional Neural Network (CNN) classification system was designed for the task of source localization of human voices in 3-D space. A new dataset, VoiceBin100K, is introduced to accomplish this task and for future work in the field. The CNN inputs variable-length binaurual short- time Fourier Transform (STFT) magnitude and phase features and predicts location of the speaker’s voice according to 168 location classes.
Reference: https://cs.rochester.edu/~cxu22/t/577F17/bluehive_tutorial.html
Please contact ghunkins@u.rochester.edu for access to the data. A public link will available shortly.