Word Timestamps

Question

Word Timestamps

Matheusadler opened this issue 4 months ago · comments

Matheus Adler commented 4 months ago

Hey there!

When I use the openai/whisper-large-v2 model with the pipeline as follows:

outputs = pipe("filename.wav",
chunk_length_s=30,
batch_size=16,
return_timestamps=True,)

I get the timestamps of each chunk:

{'text': " When you were here before Couldn't look you in the eye You're just like an angel Your skin makes me cry You float like a feather Like a feather in a beautiful world I wish I was special You're so fucking special But I'm a creep (...)",
'chunks': [{'timestamp': (0.0, 27.0),
'text': " When you were here before Couldn't look you in the eye You're just like an angel"},
{'timestamp': (34.24, 41.24),
'text': ' Your skin makes me cry You float like a feather'},
{'timestamp': (47.0, 50.0), 'text': ' Like a feather in a beautiful world'},
{'timestamp': (53.0, 55.0), 'text': ' I wish I was special'},
{'timestamp': (58.0, 60.0), 'text': " You're so fucking special"},
{'timestamp': (62.0, 65.4), 'text': " But I'm a creep"},

In this pipeline, would it be possible to get the timestamps of each word?