AudioClustering

This package contains experiments and utilities for unsupervised learning on acoustic recordings. This package is a use case of SpectralDistances.jl

Installation

using Pkg
pkg"add https://github.com/baggepinnen/DetectionIoTools.jl"
pkg"add https://github.com/baggepinnen/AudioClustering.jl"

Examples

Estimating linear models

The following code illustrates how to use SpectralDistances.jl to fit rational spectra to audio samples and extract the poles for use as features

using SpectralDistances, Glob
path      = "/home/fredrikb/birds/" # path to a bunch of wav files
cd(path)
files     = glob("*.wav")
const fs  = 44100
na        = 20
fitmethod = LS(na=na)

models    = mapsoundfiles(files) do sound
     sound = SpectralDistances.bp_filter(sound, (50/fs, 18000/fs))
     SpectralDistances.fitmodel(fitmethod, sound)
end

We now have a vector of vectors with linear models fit to the sound files. To make this easier to work with, we flatten this structure to a single long vector and extract the poles (roots) of the linear systems to use as features

X = embeddings(models)

We now have some audio data, represented as poles of rational spectra, in a matrix X. See https://baggepinnen.github.io/SpectralDistances.jl/latest/examples/#Examples-1 for examples of how to use this matrix for analysis of the signals, e.g., classification, clustering and detection.

Graph-based clustering

Model based

A graph representation of X can be obtained with

G = audiograph(X, 5; λ=0)

where k=5 is the number of nearest neighbors considered when building the graph. If λ=0 the graph will be weighted by distance, whereas if λ>0 the graph will be weigted according to adjacency under the kernel exp(-λ*d). The metric used is the Euclidean distance. If you want to use a more sophisticated distance, try, e.g.,

dist = OptimalTransportRootDistance(domain=Continuous(), p=2)
G = audiograph(X, 5, dist; λ=0)

Here, the Euclidean distance will be used to select neighbors, but the edges will be weighted using the provided distance. This avoids having to calculate a very large number of pairwise distances using the more expensive distance metric.

Any graph-based algorithm may now operate on G, or on the field G.weight. Further examples are available here.

Spectrogram based

The following snippets show how to preprocess data to a suitable form for clustering using this package:

using GLob
files = glob("*.wav") # Vector of file paths
const fs = Int(wavread(files[1])[2]) # Rread the sampling time
N = length(files)

using TotalLeastSquares # For lowrankfilter
function lrfilt(y)
    yf = lowrankfilter(y, min(250, length(y)-1100), lag=10)
end

"Perform some simple threshold filtering and calculate a spectrogram"
function spec(sound)
    @. sound = Float32(100000 * clamp(sound, -0.015f0, 0.015f0)) # the 100000 multiplier is to normalize the Float32 data for better numerical performance. Tune all parameters to you use case.
    # sound = lrfilt(sound) # This is an alternative to the above which is *much* better, but also much slower
    melspectrogram(sound, 100, 70, nmels=30, fs=fs, fmin=5)      # Spend some time making sure spectrogram representation is good.
end

using ThreadTools # For tmap
spectrograms = tmap(files) do file
    sound = spec(vec(wavread(file)[1]))
end

matrices = [Float32.(max.(normalize_spectrogram(s), 1e-7)) for s in spectrograms]
# matrices_masked = mask_filter.(matrices) # This is an alternative if the lowrankfilter is not used https://baggepinnen.github.io/SpectralDistances.jl/latest/distances/#SpectralDistances.mask_filter

inds, D = initialize_clusters(dist, matrices; init_multiplier = 10, N_seeds = 100)
patterns = matrices[inds] # These should be good cluster seeds

Distance matrix-based clustering

See docs entry Clustering using a distance matrix

Feature-based clustering

See docs entry Clustering using features

Accelerated k-nearest neighbor

inds, dists, D = knn_accelerated(dist, X, k, Xe=X; kwargs...)

Find the nearest neighbor from using distance metric dist by first finding the k nearest neighbors using Euclidean distance on embeddings produced from Xe, and then using dist do find the smallest distance within those k.

X is assumed to be a vector of something dist can operate on, such as a vector of models from SpectralDistances. Xe is by default the same as X, or possibly something else, as long as embeddings(Xe) is defined. A vector of models or spectrograms has this function defined.

D is a sparse matrix with all the computed distances from dist. This matrix contains raw distance measurements, to symmetrize, call SpectralDistances.symmetrize!(D). The returned dists are already symmetrized.

baggepinnen / AudioClustering.jl