What settings improve word timestamps accuracy of timing?
scaruslooner opened this issue · comments
I have this problem where the end transcript time cuts off the word instead of capturing the whole word.
What settings do you recommend I play with to get more accurate/precise timing of the ending of a word?
(Im using a 4060ti 16 gb vram)
This is what i type in:
"C:\Program Files\Whisper\whisper-faster.exe" "C:\Users\camer\OneDrive\Desktop\Whisper-Faster_r186.1_windows\Whisper-Faster\2024-03-10 20-56-19.mp4" --model large-v3 --output_dir C:\Users\camer\OneDrive\Desktop\Whisper-Faster_r186.1_windows\Whisper-Faster --output_format srt --language en --sentence
Preview of the output:
(Also i highlighted the words that are getting cut off)
1
00:00:04,800 --> 00:00:05,240
Yo.
2
00:00:05,240 --> 00:00:05,680
Yo.
3
00:00:07,680 --> 00:00:09,300
The Pistons suck this year.
4
00:00:09,440 --> 00:00:10,620
I don't know why they're so bad.
5
00:00:11,620 --> 00:00:13,240
I feel like they need to get better.
6
00:00:14,720 --> 00:00:16,060
In everything they do.
7
00:00:16,380 --> 00:00:17,120
You feel me?
8
00:00:22,070 --> 00:00:23,770
I'm just kind of eating right now.
9
00:00:25,770 --> 00:00:26,750
Kyrie has the ball.
"
Share the json file.
Can you share the audio?
I dont where the json file is but i have srt file.
2024-03-10 20-56-19 words get cuttoff.txt
2024-03-10.20-56-19.mp4
Oh wait i can generate json. Here it is:
2024-03-10 20-56-19.json
Maybe you need to zip json and srt files as I can't open them.
SrtAndJSON.zip
Does this work
Nope, zip doesn't work too. Upload there ->https://wetransfer.com/
https://we.tl/t-r5dMVjh6Ue
(I switched the link )
That is normal precision for whisper.
Maybe in the future it will be improved with wav2vec alignment.
--output_dir C:\Users\camer\OneDrive\Desktop\Whisper-Faster_r186.1_windows\Whisper-Faster
You can use -o source
--output_format srt
not needed as srt is default