akashmjn / tinydiarize

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to train a new model?

shell1986 opened this issue · comments

Hello, I am very far from teaching models, but at the same time I am very interested in trying to make a model for diarization of anime characters.
I prepared a dataset based on ASS subtitles with highlighted characters. I wrote a script that, based on these subtitles, cuts out parts of phrases that correspond to tags in the subtitles and distributes them to different folders in the form of audio files.
Now I want to understand how to train a model based on this data.

If you have links or examples of how to do this, please recommend them.
I program in PHP and JS, sometimes I write something in C++.