facebookresearch / stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

Home Page:https://facebookresearch.github.io/stopes/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug in tokenizer for Tibetian Language

asusdisciple opened this issue · comments

At the moment you cant use tibetian language tokenizer. It gives the error message:

TypeError: "module" object is not callable

The error is thrown here in sentence_split.py:

    elif split_algo == "bodnlp":
        logger.info(f" - Tibetan NLTK sentence splitter applied to '{lang}'")
        from botok.tokenizers import sentencetokenizer as bod_sent_tok

We have released the new version 2.1.0 last week, could you check again ?