dmlc / treelite

Universal model exchange and serialization format for decision tree forests

Home Page:https://treelite.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEA] Interface for fast out-of-bag predictions

edwardwliu opened this issue · comments

Background

Treelite currently returns a single averaged value across all trees in a forest. However, in some cases such as for out-of-bag predictions, only a subset of the trees may be used for averaging. Out-of-bag predictions can be extremely useful for evaluating models and further model diagnostics. Although Treelite itself does not track which observations were out-of-sample per tree, many libraries do provide this information (e.g. _generate_unsampled_indices() in Sklearn). If Treelite were to expose the individual predictions per tree, users could then calculate out-of-bag results by hand.

Potential Implementation

Treelite could return an array of predictions per tree. See this notebook for a detailed exploration.
Alternatively, a user could pre-emptively specify which trees should be used for averaging. If implemented, the existing feature request for predict_leaf() could also be manipulated to return OOB predictions, but would require a much slower multi-pass approach.

@edwardwliu Would it be useful if Treelite implements a function predict_bytree that returns individual predictions per tree? This function would have a similar implementation as predict_leaf.

Yes, returning individual predictions by tree would work great.