False positive warning when manipulating pandas dataframes

Question

False positive warning when manipulating pandas dataframes

Paroag opened this issue 5 months ago · comments

Scikit learn added compatibility for pandas dataframe with the set_output API update. I have sklearn pipelines in my project that uses pyod models. When fitting/predicting, the following warning is triggered:

UserWarning: X has feature names, but IsolationForest was fitted without feature names

The IForest.fit method does not actually pass the pandas dataframe to the underlying IsolationForest but the associated numpy array. The line of code X = check_array(X) is responsible for the conversion.

Here is a reproducible example:

import pandas as pd
from pyod.models.iforest import IForest


data = pd.DataFrame({
    "col1": [1, 2, 3, 4],
    "col2": [1, 2, 3, 4]
})

forest = IForest()
forest.fit(data)
forest.predict_proba(data)

Any ideas on how to address this issue ?