SonyCSLParis / audio-metrics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Audio Metrics

This repository contains a python package to compute distribution-based metrics for audio data using embeddings.

The metrics use the publicly available pretrained VGGish model (trained on audio event classification) to compute embeddings (see https://arxiv.org/abs/1609.09430). In particular, it uses the 128-dimensional embeddings from the last feature layer before the classification layer.

Usage

from audio_metrics import AudioMetrics

metric = AudioMetrics()

# instantiate the metrics
metric.prepare_background('/path/to/real/audiofiles/')

fad, density, coverage = metric.compare_to_background('/path/to/fake/audiofiles')

metric.save_base_statistics('/path/to/background_stats.npz')

TODO

Use alternative embeddings, e.g.:

  • Penultimate 4096-dimensional VGGish feature layer
  • A lower dimensional random projection of those features?
  • Random VGGish embeddings (random parameter initialization)

Credits

This repository was inspired by, and contains some code from https://github.com/spaghettiSystems/pytorch-fad

About

License:GNU General Public License v3.0


Languages

Language:Python 100.0%