scaryrawr / vtt-generator

Generates VTT files for videos using ffmpeg and azure cognitive services

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some part of transcript skipped in the VTT file

Invincible166 opened this issue · comments

There is a small issue in this code on this line
display_words = result['DisplayText'].split(' ')

There could be words like "person's" in the transcript. Here Azure would return this word in 2 parts in the "words" list as person and 's.
The indexes of display_words would not be in sync with words list. Hence instead use this:

display_words = transcript_obj['NBest'][max_confidence_index]['Lexical'].split(' ')

This would solve the problem. Formatting could be missing but that can be added as additional code.

Awesome catch! Thanks!