lightgbm_practice

Python code in this repo are Python3.9.12, running pyfile using:

pipenv run python xxx.py

Usage of lgb.Dataset()

data = np.random.rand(500, 10)  # 500 entities, each contains 10 features
label = np.random.randint(2, size=500)  # binary target

Specific feature names and categorical features:

train_data = lgb.Dataset(data, label=label, feature_name=['c1', 'c2', 'c3'], categorical_feature=['c3'])

LightGBM can use categorical features as input directly. It doesn’t need to convert to one-hot encoding, and is much faster than one-hot encoding (about 8x speed-up).

Note: You should convert your categorical features to int type before you construct Dataset.

Weights can be set when needed:

w = np.random.rand(500, )
train_data = lgb.Dataset(data, label=label, weight=w)

train_data = lgb.Dataset(data, label=label)
w = np.random.rand(500, )
train_data.set_weight(w)

Ref

https://lightgbm.readthedocs.io/en/v3.3.2/

xhqing / lightgbm_practice

lightgbm_practice

Usage of lgb.Dataset()

Specific feature names and categorical features:

Weights can be set when needed:

Ref

About

Languages