caiselvass / language-identification

An NLP project leveraging character trigrams and smoothing techniques (Lidstone, Linear Discounting, Absolute Discounting) for language identification. Trained on for Spanish, Italian, English, French, Dutch, and German, achieving 99.8932% accuracy. Includes datasets, model parameters, and comprehensive documentation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Validation to-do

pauhidalgoo opened this issue · comments

There are some basic functions for validation, and most of the code is prepared for it. However, currently it only gathers the number of errors (but maybe we should try other metrics). It remains to-do:

  • Determine if B values should be from validation
  • (Conditional) If they are from validation, implement a way to change them
  • Write some code to gather statistics about each iteration step.
  • Execute validation and get parameters