juanmc2005 / diart

A python package to build AI-powered real-time audio applications

Home Page:https://diart.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add compatibility with new pyannote segmentation model

juanmc2005 opened this issue · comments

Pyannote has recently released pyannote/segmentation-3.0, let's include it!

This could be done in a pretty straightforward way by checking the activation layer of the loaded segmentation model.
In diart.models.PyannoteSegmentationModel.__call__(), we can add the following:

from pyannote.audio.utils.powerset import Powerset

segmentation = self.model(waveform)
if isinstance(self.model.activation, torch.nn.LogSoftmax):  # or Softmax
    powerset = Powerset(max_speakers_per_chunk, max_speakers_per_frame)
    return powerset.to_multilabel(segmentation)
return segmentation

I would recommend checking if self.model.specifications.powerset instead.

Also, Powerset.to_multilabel now has a soft keyword argument that you can set to True to get soft multi-label segmentation (though I would recommend sticking with hard ones to remove the need for the activation threshold).

I am willing to contribute this feature.

I plan to replace self.model(waveform) by self.to_multilabel(model(waveform)) where self.to_multilabel is

  • Powerset(max_speakers_per_chunk, max_speakers_per_frame).to_multilabel if model uses powerset multi-class paradigm
  • torch.nn.Identity() if model uses multi-label paradigm

However, I don't want to have to go through the instantiation of powerset for every PyannoteSegmentationModel.__call__.

My question is therefore: where should I instantiate self.to_multilabel?

Also, Powerset inherits torch.nn.Module so should ideally be sent to the same device as self.model (but maybe diart only supports CPU for now?). Therefore, I believe the right way is to define it in PyannoteSegmentationModel.__init__ but maybe it goes against the whole LazyModel thing (which I don't really understand why it is necessary).

That's awesome! Thanks @hbredin for this contribution!

I would make this PowersetAdapter a wrapper of SegmentationModel and make PyannoteLoader set the wrapper accordingly.

Currently, the LazyModel guarantees that models can be shared across processes, for example for parallel benchmarking, and then load the weights when the process starts. This may not be the best approach, so I'm open to change it in the future.

I'm going to be working on moving pipeline configs to yaml files. This should allow configs to be serializable, which would remove the need for LazyModel, so I'll refactor the entire models.py at that point

Implemented in #198