huggingface / speechbox

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[New Task] Add timestamp alignment

patrickvonplaten opened this issue · comments

It would be very nice to have a simply tool to align timestamps and audio, something along the lines:

from speechbox import SpeechAligner

aligner = SpeechAligner.from_pretrained(...)

aligner.align(audio=audio, transcript=transcript)

Do you have something in mind such as this repo which uses wav2vec 2.0 models to do forced alignment to obtain word-based timestamps?

Ah wow this repo is super cool - haven't seen it before.

Definitely happy to officially link to this repo - just wondering if we can make something nice by just using Whisper so that much less RAM would be required

@patrickvonplaten
If I understand the problem correctly.
code in this notebook from whisper can solve the problem

https://github.com/openai/whisper/blob/main/notebooks/Multilingual_ASR.ipynb

Yes indeed, this seems like a nice way of doing it - even though it looks quite memory expensive O(#words x time). I wonder whether there could also be a way that's less memory intensive to do it.

I came across this tweet some time ago
https://twitter.com/ramsri_goutham/status/1603003724846501889

from sequence alignment in Bioinformatics