Sentencpiece tokenizer tests
Testing sentencepiece tokenizer on Estonian language.
Experiment
- run 1.0_train_tokenizers_risto.ipynb to train sentencepiece tokenizers with different parameters
- run 2.0_train_sklearn_models_risto.ipynb to test tokenizers in sklearn text classification