antoniopaisfernandes / language-models

pre-trained Language Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Language Models

Repository of pre-trained Language Models.

WARNING: a Bidirectional LM model using the MultiFiT configuration is a good model to perform text classification but with only 46 millions of parameters, it is far from being a LM that can compete with GPT-2 or BERT in NLP tasks like text generation. This my next step ;-)

Note: The training times shown in the tables on this page are the sum of the creation time of Fastai Databunch (forward and backward) and the training duration of the bidirectional model over 10 periods. The download time of the Wikipedia corpus and its preparation time are not counted.

Portuguese

I trained 1 Portuguese Bidirectional Language Model (PBLM) with the MultiFit configuration with 1 NVIDIA GPU v100 on GCP.

MultiFiT configuration (architecture 4 QRNN with 1550 hidden parameters by layer / tokenizer SentencePiece (15 000 tokens))

PBLM accuracy perplexity training time
forward 39.68% 21.76 8h
backward 43.67% 22.16 8h

Here's an example of using the classifier to predict the category of a TCU legal text:

Using the classifier to predict the category of TCU legal texts

French

I trained 3 French Bidirectional Language Models (FBLM) with 1 NVIDIA GPU v100 on GCP but the best is the one trained with the MultiFit configuration.

French Bidirectional Language Models (FBLM) accuracy perplexity training time
MultiFiT with 4 QRNN + SentencePiece (15 000 tokens) forward 43.77% 16.09 8h40
backward 49.29% 16.58 8h10
ULMFiT with 3 QRNN + SentencePiece (15 000 tokens) forward 40.99% 19.96 5h30
backward 47.19% 19.47 5h30
ULMFiT with 3 AWD-LSTM + spaCy (60 000 tokens) forward 36.44% 25.62 11h
backward 42.65% 27.09 11h

1. MultiFiT configuration (architecture 4 QRNN with 1550 hidden parameters by layer / tokenizer SentencePiece (15 000 tokens))

FBLM accuracy perplexity training time
forward 43.77% 16.09 8h40
backward 49.29% 16.58 8h10

Here's an example of using the classifier to predict the feeling of comments on an amazon product:

Using the classifier to predict the feeling of comments on an amazon product

2. Architecture QRNN / tokenizer SentencePiece

FBLM accuracy perplexity training time
forward 40.99% 19.96 5h30
backward 47.19% 19.47 5h30

3. Architecture AWD-LSTM / tokenizer spaCy

FBLM accuracy perplexity training time
forward 36.44% 25.62 11h
backward 42.65% 27.09 11h

About

pre-trained Language Models


Languages

Language:Jupyter Notebook 99.3%Language:Python 0.7%