opus-mt-en-zh does not respect the end token
l3utterfly opened this issue · comments
Using the code here (but in c++): https://opennmt.net/CTranslate2/guides/opus_mt.html
auto translationOptions = ctranslate2::TranslationOptions();
translationOptions.beam_size = 1;
translationOptions.return_scores = false;
const std::vector<std::vector<std::string>> batch = {pieces};
const std::vector<ctranslate2::TranslationResult> results = translators[modelPathStr]->translate_batch(batch, translationOptions);
Translating something simple like "Hello", gives the chinese characters "你好" again and again (up to 256 times), which I believe is the max sequence length. Interestingly, translating "Hello, World!" gives the expected result.