Rule based language model

Question

Rule based language model

kezakool opened this issue 2 years ago · comments

Hi,

Thank you for all your work and shares!
I'm trying to use kenlm to make some rule based language models, on small texts to detect child text missreading.

I need to tweek the probabilities of each node to keep like 70% on the nominal text and 30% on errors, what is the best way to build the correct arpa file with your package?

For now, i ve generated artificial text variations including errors, and added a big percentage of duplicated nominal sentences to get the good probabilities.
with this, i ve succeded in generating some lm with the discount_fallback-- parameter, but i find it hard to verify and get some weird results, making me thinking that the duplicated sentences are not effective on probabilities.

Thank you for your time :)