Translation of pre-split and pre-tokenized sentences

Question

Translation of pre-split and pre-tokenized sentences

BLKSerene opened this issue 2 months ago · comments

Hi, the doc says Argos Translate uses SentencePiece (and maybe Sacremoses?) for tokenization and Stanza for sentence boundary detection. I'm wondering whether it is possible to translate pre-split and pre-tokenized sentences (a list of lists of tokens), in which case I could drop many dependencies of Argos Translate, since there are many problems concerning the strict version pin of dependencies (cf. #362, #395).