Anyone have demo source code to process file with whisper large model and get outputs as vtt srt?

Question

Anyone have demo source code to process file with whisper large model and get outputs as vtt srt?

FurkanGozukara opened this issue a year ago · comments

Dear @patrickvonplaten thank you very much for this repo
I am using almost every day to generate subtitles for my videos by using Whisper

However, I need its produced vtt file (basically subtitle output format of transcription)

Currently to fix and improve punctuation, I am using fullstop-punctuation-multilang-large (https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large) but I can't say it is the best

I would like to test your repo however I need demo for full vtt export

Could you release a demo source code that can output vtt file ? it can have both fixed and raw output of whipser for comparison

Moreover, I have added you from linkedin if you accept i appreciate : https://www.linkedin.com/in/furkangozukara/

One final thing. I am also very interested in stable diffusion and preparing tutorial videos. I hope that you consider adding my tutorial videos to readme here : https://huggingface.co/runwayml/stable-diffusion-v1-5

And perhaps open back this topic so people can learn? thank you : https://huggingface.co/runwayml/stable-diffusion-v1-5/discussions/66

Patrick von Platen · Answer 1 · Mon Jan 16 2023 21:21:57 GMT+0800 (China Standard Time)

For a demo, please have a look at: https://github.com/huggingface/speechbox#web-demo

It would be nice to not conflate diffusion models with this library. This library is not about diffusion models, but about speech models.

Furkan Gözükara · Answer 2 · Mon Jan 16 2023 21:43:01 GMT+0800 (China Standard Time)

For a demo, please have a look at: https://github.com/huggingface/speechbox#web-demo

It would be nice to not conflate diffusion models with this library. This library is not about diffusion models, but about speech models.

Thank you I already saw the demo. But how to get subtitle formatted outputs?