Is it possible to provide a scikit-learn interface?

Question

Is it possible to provide a scikit-learn interface?

hengzhe-zhang opened this issue 3 years ago · comments

This project is interesting and I want to use it as the baseline algorithm for my paper. However, it seems that I need to take several steps in order to make a prediction. Consequently, is it possible to provide a scikit-learn interface for making a convenient comparison between different algorithms?

Yury Gorishniy · Answer 1 · Mon Sep 27 2021 18:22:29 GMT+0800 (China Standard Time)

UPD

There are several approaches to training and prediction:

implement things manually (as it is done in the official example)
use a general purpose framework (for example, Lightning).
(use with caution) use a high-level library (for example, skorch provides a Scikit-Learn interface for PyTorch, which looks like what you are looking for). WARNING: high-level libraries usually provide default training parameters that can be suboptimal for the actual task at hand. You should not rely on the default parameters. Instead, you should tune them or take inspiration from pipelines for tasks that are similar to your task. In all cases, you should explicitly pass all the training parameters (such as optimizer, batch size, learning rate, weight decay, early stopping settings, epochs, etc.) to the corresponding functions.

Hengzhe Zhang · Answer 2 · Mon Sep 27 2021 18:37:05 GMT+0800 (China Standard Time)

Do you mean that I only need to wrap the FTTransformer using skorch?

Yury Gorishniy · Answer 3 · Mon Sep 27 2021 20:02:35 GMT+0800 (China Standard Time)

NOTE: I have updated the previous answer, please, read it first.

Do you mean that I only need to wrap the FTTransformer using skorch?

In theory, yes. Note that rtdl.FTTransformer expects two arguments (numerical and categorical features), so you will need to read this section.

Yury Gorishniy · Answer 4 · Thu Sep 30 2021 01:06:00 GMT+0800 (China Standard Time)

Feel free to reopen the issue if you have more questions on the topic.

Hengzhe Zhang · Answer 5 · Sat Oct 09 2021 17:46:01 GMT+0800 (China Standard Time)

@Yura52 I implement a scikit-learn compatible interface for algorithms in this library and already open sourced it on GitHub. (https://github.com/zhenlingcn/scikit-rtdl)