akashmjn / tinydiarize

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HF Transformers Weights

sanchit-gandhi opened this issue · comments

Hey @akashmjn!

Firstly, thank you for creating such an awesome project for the Whisper community! Very excited to try out these checkpoints.

I understand from your README that you used HF Transformers + Datasets to fine-tune the model? It would be super cool to push the resulting HF checkpoints to the Hub, such that all HF users can call your fine-tuned checkpoint using .from_pretrained.

This can be done directly from the model repository (which is a standard Git repo):

git add . && git commit -m "Add weights and training results" && git push

Or, you can do it Pythonically using the huggingface_hub package (installed with Transformers): https://huggingface.co/docs/huggingface_hub/guides/upload

Note that we're already getting feature requests for these checkpoints in Transformers (see issue), so there's demand from the community!

Hey @sanchit-gandhi - thanks for reaching out! Ack - will look into this later in the week.

Meanwhile, worth calling out that you'll also need small edits to inference code so that the extra token is not suppressed.

For an example, see #4 #11 (edits on this python repo) or ggerganov/whisper.cpp#1058 (for what was done in cpp)

For the first PR #4 - I think it'll be possible to do something similar in HF transformers without any code changes by simply updating the EOS token to be the speaker turn token (model.generation_config.eos_token=...). For the second PR #11, this might require some more involvement to make sure it works with the transformers batching algorithm, however I think first we can try simply updating the EOS token and running generation to see what the quality is like.

Hi @sanchit-gandhi!

Sorry about the silence here. I just posted a project update on #14 that I want to make sure I brought to your attention.

While releasing Transformers weights is conceptually equal to what's already released, given the wider public use TBH I'm playing it safe here for now 😅 . Feel free to DM me (@ akashmjn) or email if you'd like to discuss further.

Great job with Distil-Whisper! (given that, pretty sure you should be able to replicate this too 😉 )

Hey @akashmjn - sorry to hear about the complications with open-sourcing more artefacts. We'd love to facilitate this project and more open-sourcing of checkpoints + code when possible (as @Vaibhavs10 suggested we can provide some compute if required). It's awesome to see what you've built so far here and I hope you can continue contributing!