produce sourcemap of translation
milahu opened this issue · comments
source-to-source compilers usually produce sourcemaps
so for each output token i can see "where does this token come from?"
sourcemaps would be useful for language-to-language translators
for translating rich text formats like html, odt, docx, pdf...
to translate a rich text document, i would remove all markup
feed the plain text of sentences to the translator
and then use the sourcemap to reconstruct the markup
would this be possible?
google translate shows the connection between sentences
such a "sourcemap of sentences" would also be useful
If CTranslate2 has support for sourcemaps then this might be possible.
argos-translate-files supports translating odt, html, docx