Small roberta model for Vietnamese with hugginface training scripts 6 layers, 12 attention heads and 384 hidden with whole word masking Pre-tokenize using underthesea word_tokenize Using Vietnews train dataset and first 3gb of vi Oscar-corpus. Pretrained model is now available on huggingface Fine-tune model for Abstractive summarization task on Vietnews dataset (see abstractive-summarization)