sharavsambuu / mongolian-text-classification

Cyrillic Mongolian text classification with tensorflow 2, and also some fine-tuning on TugsTugi's Mongolian BERT model and other NLP experiments are included.

Home Page:https://github.com/tugstugi/mongolian-bert

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mongolian-text-classification

Mongolian cyrillic text classification with modern tensorflow and some fine tuning on TugsTugi's BERT model.

Load Mongolian BERT in Tensorflow 2

Open In Colab

Generate text using Mongolian BERT

Open In Colab

Visualize Mongolian BERT using bertviz and pytorch model

Open In Colab

Alt text

Fine tuning TugsTugi's Mongolian BERT model

On TPU mode, loading checkpoints from the file system doesn't supported by the bert and bucket should be used.

Fine tuning mongolian BERT on TPU, You need own bucket in order to finetune on TPU Open In Colab

Fine tune a mongolian BERT on GPU, a lot of computation needed, a low batch size matters due to memory limit Open In Colab

Classifiers using simple neural networks

No 01, Simplest classifier Open In Colab

No 02, Pretrained Word2Vec initialization from Facebook's fasttext, kind of transfer learningish. Embedding layer is not trainable in this case Open In Colab and with trainable embedding layer Open In Colab

No 03, 1D Convolution Open In Colab and multiple 1D convnets Open In Colab

No 04, LSTM Open In Colab

Visualize RNN neuron firing in text generation Open In Colab

No 05, LSTM with Attention, visualization of attention scores in text classification Open In Colab

No 06, Classification with Mongolian BERT and Tensorflow 2.0, with frozen bert layers Open In Colab

No 07, Classification with Mongolian BERT large and HuggingFace and Tensorflow 2 Open In Colab

Mongolian sentence interpolation experiments

Sequence loss in keras and tf2 Open In Colab

Variational Auto Encoder for Mongolian text Open In Colab

Other experiments

Predict next word, greedy text generation Open In Colab

Series included(or will) followings

word2vec initialization, 1D Convolution, RNN variants, Attention, Some weights visualization for reasoning, Transformer, Techniques to handle longer texts and so on...

useful references and resources

Images and screenshots

Alt text Alt text Alt text Alt text

About

Cyrillic Mongolian text classification with tensorflow 2, and also some fine-tuning on TugsTugi's Mongolian BERT model and other NLP experiments are included.

https://github.com/tugstugi/mongolian-bert


Languages

Language:Jupyter Notebook 95.2%Language:Python 4.6%Language:HTML 0.1%