ozdefir / finetuneas

An HTML interface for finetuning the sync map output from aeneas

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Import csv instead of json?

bwang482 opened this issue · comments

Thanks @ozdefir for the great interface! Our team (based in Oxford) is actually using it for editing the transcriptions and time alignments for our audio recordings, and will certainly give it a shoutout in our upcoming papers.

May I ask if it is possible to import csv file instead of json? We have a bunch of csv files, each one contains columns of transcript, start_time, end_time (for each segment) and speaker, for the corresponding audio file?

Also, there seems to be a character limit for each segment transcript displayed on the interface? If it is true, then is it possible to increase or drop the limit?

Thanks again!

Hi,
The initial version of finetuneas used to import the CSV output from aeneas, then we switched to json. After your request I recently added CSV import option too.
But you will have to fork it because aeneas' CSV output is different from the CSV files you describe. There the order goes: id, begin time, end time, text. Like this:

f001,0.000,1.234,"First fragment text"
f002,1.234,5.678,"Second fragment text"
f003,5.678,7.890,"Third fragment text"

You will have to modify the parseCSV function in finetuneas.html.

There's no character limit that I can think of. If you have a sample file where you experience this I would like to have a look.

That's great! Thanks @ozdefir !! I shall close the issue now.