Frank-Xu0818 / SSLVC

Sound Source Localization using Visual Cues

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CS598ps_project

In this paper we present a creative approach to reconstruct 3D audio for multiple sources from a single channel input by detecting and tracking visual cues using supervised learning methods. We also discuss a similar approach for improving speaker’s classification from a video stream by employing both facial and speech likelihoods, or simply Multimodal Speaker Recognition on a video stream.

Videos assets are here:

About

Sound Source Localization using Visual Cues


Languages

Language:TeX 77.3%Language:Python 20.9%Language:Makefile 1.8%