It computes a transition probability of a text.
I want to determine if a word was randomly generated. I guess that it can be determined by text transition probabilities trained from correct words.
I trained transition probability using almost english words. I computed and compared probability for english words learned at junior high school in Japan, and randomly generated words. From the figure below, it can see that each peak is different.
$ pip install texttrans
Transition probability is computed for English words. I use "words_alpha.txt" of dwyl/english-words to train default probability.
from texttrans.texttrans import TextTrans
p = TextTrans().prob("pen")
print(p)
0.11640052876679541
It prepares a text file that lists words, e.g. like below.
hogehoge
piyopiyo
It train text transtion of input text.
from texttrans.texttrans import TextTrans
train_path = "train.txt"
model_path = "model.pki"
tt1 = TextTrans(lang=None)
tt1.train(train_path= train_path, save_path= model_path)
print("p =", tt.prob("hoge"))
It computes the probability according to trained model.
tt2 = TextTrans(model_path=model_path)
print("p =", tt.prob("hoge"))