lowerquality / gentle

gentle forced aligner

Home Page:https://lowerquality.com/gentle/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to insert punctuation and special symbols ex: (,.+-"') back into the word level timings?

jahamed opened this issue · comments

Let's say my original transcript is This "happened" in my (skydiving years).

Obviously the returned JSON does not have " or ( or . in the 'word' or 'alignedWord'. Is there somehow to put punctuation and special symbols back into the JSON? I am using the JSON's word level timings to build a subtitle file based on characters per line.

I agree it would be nice to have it as an option in the tool. In the meantime, I work around it by a simple substitution from the words in my original text. Something like this:

gentle_segments = parse_gentle_output(...)
words = split_into_words(original_transcript)
for word, segment in zip(words, gentle_segments):
   segment['text'] = word

I agree it would be nice to have it as an option in the tool. In the meantime, I work around it by a simple substitution from the words in my original text. Something like this:

gentle_segments = parse_gentle_output(...)
words = split_into_words(original_transcript)
for word, segment in zip(words, gentle_segments):
   segment['text'] = word

Not sure if it helps but I'm using this new library : https://github.com/echogarden-project/echogarden
It's working excellent for forced alignment and has a ton of other features, seems like a more modern, easier to use tool.
The developer is very responsive too!

@jahamed Thanks, it's nice to have more modern options available.