feedzai / fairgbm

Train Gradient Boosting models that are both high-performance *and* Fair!

Home Page:https://arxiv.org/abs/2209.07850

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with running fairgbm_clf.fit()

Abdullahifrh opened this issue · comments

Description

We are trying to implement FairGBM in order to classify a certain feature: severity_score_class while using districts as the constraint group. After trying to train the features using X, Y and S with fairgbm_clf.fit(X_train, Y_train, constraint_group=S), the following error arises ->LightGBMError: Input data type error or field not found. After many attempts to fix this issue, it still persists.

Reproducible example

`import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import lightgbm
from fairgbm import FairGBMClassifier

data = pd.read_csv('total_df_final_for_models.csv').drop(columns=['Column1', 'Column2'])
TARGET_COL = "severity_score_class"
SENSITIVE_COL = "district"
def retrieve_X(data):
ignored_cols = [TARGET_COL, SENSITIVE_COL, "severity_score"]
feature_cols = [col for col in data.columns if col not in ignored_cols]
X = data[feature_cols]
return X
def retrieve_Y(data):
Y = data[TARGET_COL]
return Y
def retrieve_S(data):
data["district"] = data["district"].astype('category')
data["district_encoding"] = data["district"].cat.codes
S = data["district_encoding"]
return S

X = retrieve_X(data)
Y = retrieve_Y(data)
S = retrieve_S(data)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=16)

fairgbm_clf = FairGBMClassifier(constraint_type="FNR", # constraint on equal group-wise TPR (equal opportunity)
n_estimators=200, # core parameters from vanilla LightGBM
random_state=16)

fairgbm_clf.fit(X_train, Y_train, constraint_group=S)`

Additional Comments

The Y variable is multiclass as opposed to the binary predictions that FairGBM makes use of. Y consists of three levels and thus might be a problem if multiclass classification is not possible with FairGBM. The constraint group S consists of 69 districts. Maybe these are the reasons for the LightGBM Error. Every line of code works until the fairgbm_clf.fit() function.
Data used: total_df_final_for_models.csv

Thank you for the raised issue @Abdullahifrh; from what I understand you were able to solve it! If not, please comment again and we will re-open the issue.