awd-lstm fastai filipino filipino-ulmfit pretrained-language-model tagalog text-classification ulmfit

Filipino-ULMFiT

This is an accompanying repository to my paper:

Pagsusuri ng RNN-based transfer learning technique sa low-resource language

Contributions

Release a pre-trained AWD LSTM language model in Filipino using fastai v2.
Benchmark AWD LSTM to the Hate Speech Dataset. [reference]

Requirements

fastai v2 and up
NVIDIA GPU (all experiments were done on Colab w/ Tesla T4)

Language Model

Total Epochs	Dataset Size	Train Set	Val Set	Accuracy	Perplexity	Total Training Time	Dataset
20	160428	90%	10%	86.71%	2.028250	26H	WikiText-TL-39

Download pre-trained language model

# Install gdown
pip install gdown

# Make directory
mkdir models

# Download data
gdown --id 19jdv8-XEbDNiqlm_lPb1csbVZYkn3gfA

# Unzip
unzip pretrained.zip -d models

# Finally
You should see two files inside 'models' directory: 
1. finetuned_weights_20.pth (pre-trained weights)
2. vocab.pkl (vocab) 

This will be used later in language model fine-tuning. 
See accompanying jupyter notebook to see usage.

Acknowledgements

Big thanks to Blaise Cruz for answering my questions and for nudging me in the right direction.

About

Pre-trained AWD-LSTM language model trained on Filipino text corpus using fastai v2. Instructions included.