argosopentech / argos-translate

Open-source offline translation library written in Python

Home Page:https://www.argosopentech.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Translation of pre-split and pre-tokenized sentences

BLKSerene opened this issue · comments

Hi, the doc says Argos Translate uses SentencePiece (and maybe Sacremoses?) for tokenization and Stanza for sentence boundary detection. I'm wondering whether it is possible to translate pre-split and pre-tokenized sentences (a list of lists of tokens), in which case I could drop many dependencies of Argos Translate, since there are many problems concerning the strict version pin of dependencies (cf. #362, #395).