prajdabre / yanmtt

Yet Another Neural Machine Translation Toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for latest version of transformers repo

prajdabre opened this issue · comments

Currently I have provided my own modded fork of transformers but if someone doesnt care about the features I have added and only wants to work with the code mbart code then this should be enabled.

What this would mean is that all those other arguments I pass to the mbart config class to instantiate the object will be sent to kwargs. The main change will be minimal and most likely related to the tokenizer. In the batch creation logic, I pass some extra arguments to the tokenizer to support stochastic tokenization. The way I see it is we have a flag called --is_official_repo which if passed means that the official transformers repo is passed. This argument will then be passed to the batching function which wont pass the flags relevant for stochastic tokenization.