argosopentech / argos-translate

Open-source offline translation library written in Python

Home Page:https://www.argosopentech.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

produce sourcemap of translation

milahu opened this issue · comments

source-to-source compilers usually produce sourcemaps
so for each output token i can see "where does this token come from?"

sourcemaps would be useful for language-to-language translators
for translating rich text formats like html, odt, docx, pdf...

to translate a rich text document, i would remove all markup
feed the plain text of sentences to the translator
and then use the sourcemap to reconstruct the markup

would this be possible?

google translate shows the connection between sentences
such a "sourcemap of sentences" would also be useful

If CTranslate2 has support for sourcemaps then this might be possible.

argos-translate-files supports translating odt, html, docx

LibreTranslate/argos-translate-files#1