[New Task] Add timestamp alignment

Question

[New Task] Add timestamp alignment

patrickvonplaten opened this issue 2 years ago · comments

Patrick von Platen commented 2 years ago

It would be very nice to have a simply tool to align timestamps and audio, something along the lines:

from speechbox import SpeechAligner

aligner = SpeechAligner.from_pretrained(...)

aligner.align(audio=audio, transcript=transcript)

Ewald Enzinger · Answer 1 · Thu Dec 29 2022 11:01:57 GMT+0800 (China Standard Time)

Do you have something in mind such as this repo which uses wav2vec 2.0 models to do forced alignment to obtain word-based timestamps?

Patrick von Platen · Answer 2 · Fri Dec 30 2022 04:32:05 GMT+0800 (China Standard Time)

Ah wow this repo is super cool - haven't seen it before.

Definitely happy to officially link to this repo - just wondering if we can make something nice by just using Whisper so that much less RAM would be required

Abdullah Mohammed · Answer 3 · Sun Jan 01 2023 06:09:55 GMT+0800 (China Standard Time)

@patrickvonplaten
If I understand the problem correctly.
code in this notebook from whisper can solve the problem

https://github.com/openai/whisper/blob/main/notebooks/Multilingual_ASR.ipynb

Patrick von Platen · Answer 4 · Mon Jan 02 2023 01:05:23 GMT+0800 (China Standard Time)

Yes indeed, this seems like a nice way of doing it - even though it looks quite memory expensive O(#words x time). I wonder whether there could also be a way that's less memory intensive to do it.

Abdullah Mohammed · Answer 5 · Mon Jan 02 2023 01:19:27 GMT+0800 (China Standard Time)

I came across this tweet some time ago
https://twitter.com/ramsri_goutham/status/1603003724846501889

from sequence alignment in Bioinformatics