vgherard / sbo

Utilities for training and evaluating text predictors based on Stupid Back-off N-gram models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Huge memory allocations from `predict.sbo_preds`

vgherard opened this issue · comments

The current (C++) implementation of the predict.sbo_preds() method has two big issues:

  • Every call to predict() makes a copy of the entire k-gram prediction tables. This is memory expensive and slow if predict() is called in a non-vectorized way (as would happen e.g. in interactive text prediction).
  • The look-up method in prediction tables is very slow, and causes huge memory allocations/deallocations for large vector input, which slow down a lot model evaluations in eval_sbo_preds(). Maybe #8 could partially fix this?

After messing around for a while with this: the easiest solution is probably to base the whole sbo_preds object on a pure C++ class. This applies also to issues #8 and #19 . Closing.