Add compatibility with new pyannote segmentation model

Question

Add compatibility with new pyannote segmentation model

juanmc2005 opened this issue 7 months ago · comments

Pyannote has recently released pyannote/segmentation-3.0, let's include it!

Juan Coria · Answer 1 · Tue Oct 31 2023 00:06:39 GMT+0800 (China Standard Time)

This could be done in a pretty straightforward way by checking the activation layer of the loaded segmentation model.
In diart.models.PyannoteSegmentationModel.__call__(), we can add the following:

from pyannote.audio.utils.powerset import Powerset

segmentation = self.model(waveform)
if isinstance(self.model.activation, torch.nn.LogSoftmax):  # or Softmax
    powerset = Powerset(max_speakers_per_chunk, max_speakers_per_frame)
    return powerset.to_multilabel(segmentation)
return segmentation

Hervé BREDIN · Answer 2 · Wed Nov 08 2023 19:56:36 GMT+0800 (China Standard Time)

I would recommend checking if self.model.specifications.powerset instead.

Also, Powerset.to_multilabel now has a soft keyword argument that you can set to True to get soft multi-label segmentation (though I would recommend sticking with hard ones to remove the need for the activation threshold).

Hervé BREDIN · Answer 3 · Wed Nov 08 2023 22:15:33 GMT+0800 (China Standard Time)

I am willing to contribute this feature.

I plan to replace self.model(waveform) by self.to_multilabel(model(waveform)) where self.to_multilabel is

Powerset(max_speakers_per_chunk, max_speakers_per_frame).to_multilabel if model uses powerset multi-class paradigm
torch.nn.Identity() if model uses multi-label paradigm

However, I don't want to have to go through the instantiation of powerset for every PyannoteSegmentationModel.__call__.

My question is therefore: where should I instantiate self.to_multilabel?

Also, Powerset inherits torch.nn.Module so should ideally be sent to the same device as self.model (but maybe diart only supports CPU for now?). Therefore, I believe the right way is to define it in PyannoteSegmentationModel.__init__ but maybe it goes against the whole LazyModel thing (which I don't really understand why it is necessary).

Juan Coria · Answer 4 · Thu Nov 09 2023 00:00:36 GMT+0800 (China Standard Time)

That's awesome! Thanks @hbredin for this contribution!

I would make this PowersetAdapter a wrapper of SegmentationModel and make PyannoteLoader set the wrapper accordingly.

Currently, the LazyModel guarantees that models can be shared across processes, for example for parallel benchmarking, and then load the weights when the process starts. This may not be the best approach, so I'm open to change it in the future.

Juan Coria · Answer 5 · Thu Nov 09 2023 04:13:32 GMT+0800 (China Standard Time)

I'm going to be working on moving pipeline configs to yaml files. This should allow configs to be serializable, which would remove the need for LazyModel, so I'll refactor the entire models.py at that point

Juan Coria · Answer 6 · Fri Nov 10 2023 21:25:23 GMT+0800 (China Standard Time)

Implemented in #198