idnavid/spkr_diarization

Speaker Diarization component for Alveo project. The role of the diarization tool is to segment long speech files into smaller chunks. The output labels will be used as a benchmark for human transcribers.

Navid Shokouhi July 2017

Packages:

Spro (for feature extraction)
AudioSeg (Diarization binaries)
Python:
- numpy, scipy

Installation guide:

Installing Spro 4.0

cd spro-directory
./configure
make 
make install

NOTE: when installing on Mac, use Spro 5.0

Installing AudioSeg:

cd audioseg-directory
./configure --with-spro=[path-to-spro-directory]
make
make install

diar.py

main module is diar.py, which contains an example script. To load in python, use diar.diarization(root_dir,wavname,ubmname,out_dir), where:

  root_dir: root directory of experiment. 
  wavname: full path to wave file on disk. 
  ubmname: full path to pretrained UBM on disk. 
  out_dir: output directory, to store intermediate files.

Examples:

A working example for for Austalk data is available in experiments.

About

Alveo Speaker Diarization project - Navid's version

Languages

Language:Python 93.6%Language:Shell 6.4%