Speaker Identity Resolution

Question

Speaker Identity Resolution

jfernandrezj opened this issue 4 months ago · comments

Thank you very much @juanmc2005 for this library, much much appreciated.
One question I have for this speaker-aware transcription is whether a custom plugin / observer / sink could be implemented for speaker identity resolution, and what would be the best pattern to achieve this.
Ideally on each buffer iteration or speaker change, a speaker resolution prediction, based on a model (probably like faiss / weaviate), could be added either to the rttm or to another file.

Any input would be much appreciated, thank you!

Juan Coria · Answer 1 · Fri Feb 02 2024 23:12:49 GMT+0800 (China Standard Time)

Hi @jfernandrezj, you could try recovering the internal speaker centroids of OnlineSpeakerClustering (centers attribute) to match them with other speakers as you mentioned. For this to work you'd need to use the same embedding model used in diart.

If you want to use a different speaker matching method/model, you can always incorporate it into the pipeline to either replace or complement diart's speaker embedding block, but this could be quite expensive in terms of latency. I would suggest to send audio to a separate speaker matching service and listening to it to label each speaker centroid at display time (e.g. speaker0 -> John).

Andres Fernandez · Answer 2 · Mon Feb 19 2024 17:49:46 GMT+0800 (China Standard Time)

Thank you very much @juanmc2005