get_tmlegain() ValueError: Bin edges must be unique

Question

get_tmlegain() ValueError: Bin edges must be unique

craftsliu opened this issue 2 months ago · comments

ValueError Traceback (most recent call last)
--> 324 plot_tmlegain(pred_df, inference_col, outcome_col=y_col,
325 treatment_col=treatment_col, p_col=p_col)
326

~/anaconda3/envs/myenv/lib/python3.8/site-packages/causalml/metrics/visualize.py in plot_tmlegain(df, inference_col, learner, outcome_col, treatment_col, p_col, n_segment, cv, calibrate_propensity, ci, figsize)
656 ci (bool, optional): whether return confidence intervals for ATE or not
657 """
--> 658 plot_df = get_tmlegain(
659 df,
660 learner=learner,

~/anaconda3/envs/myenv/lib/python3.8/site-packages/causalml/metrics/visualize.py in get_tmlegain(df, inference_col, learner, outcome_col, treatment_col, p_col, n_segment, cv, calibrate_propensity, ci)
341 treatment=df[treatment_col],
342 y=df[outcome_col],
--> 343 segment=pd.qcut(df[col], n_segment, labels=False),
344 )
345 lift_model = [0.0] * (n_segment + 1)

~/anaconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/reshape/tile.py in qcut(x, q, labels, retbins, precision, duplicates)
370 quantiles = q
371 bins = algos.quantile(x, quantiles)
--> 372 fac, bins = _bins_to_cuts(
373 x,
374 bins,

~/anaconda3/envs/myenv/lib/python3.8/site-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
411 if len(unique_bins) < len(bins) and len(bins) != 2:
412 if duplicates == "raise":
--> 413 raise ValueError(
414 f"Bin edges must be unique: {repr(bins)}.\n"
415 f"You can drop duplicate edges by setting the 'duplicates' kwarg"

ValueError: Bin edges must be unique: array([-0.08210021, 0. , 0. , 0. , 0. ,
0.07284003]).
You can drop duplicate edges by setting the 'duplicates' kwarg

Environment (please complete the following information):

OS: [Linux]
Python Version: [e.g. 3.8.19]
Versions of Major Dependencies (pandas, scikit-learn, cython): [e.g. pandas==1.3.5, scikit-learn==1.0.2, cython==0.29.34]

craftsliu · Answer 1 · Mon May 06 2024 20:34:54 GMT+0800 (China Standard Time)

use pd.qcut(df[col], n_segment, labels=False, duplicates='drop') to solve thie problem?