RRisto / sentencepiece_experiments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sentencpiece tokenizer tests

Testing sentencepiece tokenizer on Estonian language.

Experiment

  • run 1.0_train_tokenizers_risto.ipynb to train sentencepiece tokenizers with different parameters
  • run 2.0_train_sklearn_models_risto.ipynb to test tokenizers in sklearn text classification

About


Languages

Language:Jupyter Notebook 100.0%