vgherard / sbo

Utilities for training and evaluating text predictors based on Stupid Back-off N-gram models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using Run Length Encoding in `sbo_preds` objects.

vgherard opened this issue · comments

For N-gram models with N >= 3, using Run Length Encoding for k-gram prefixes in sbo_preds objects could bring two benefits:

  1. Reduce size of these objects.
  2. Make the retrieval of k-gram prefixes more efficient.