csinva / imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).

Home Page:https://csinva.io/imodels

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FIGS print and plot return different trees

mepland opened this issue · comments

In imodels_demo.ipynb the Tree #0 returned by printing the fitted model:

Glucose concentration test <= 99.500 (Tree #0 root)
	Val: 0.068 (leaf)
	Glucose concentration test <= 168.500 (split)
		#Pregnant <= 6.500 (split)
			Body mass index <= 30.850 (split)
				Val: 0.065 (leaf)
				Blood pressure(mmHg) <= 67.000 (split)
					Val: 0.705 (leaf)
					Val: 0.303 (leaf)
			Val: 0.639 (leaf)
		Blood pressure(mmHg) <= 93.000 (split)
			Val: 0.860 (leaf)
			Val: -0.009 (leaf)

and plotting:

figs

do not agree!

Based on _tree_to_str_with_data, which agrees with the simpler _tree_to_str actually being called here - see below, the first line printed after a split is the left / true branch, while the second line after the split is the right / false branch.

Reading the printed version, after the first "Glucose concentration test <= 99.500" split, there should be a leaf with value 0.068 for <= 99.5, and then the "Glucose concentration test <= 168.500" split for > 99.5, but this is not the structure of the plotted tree. Also, note that both of the "Blood pressure(mmHg)" splits should end in leaves, while in the plot one of them leads to a "#Pregnant" split.

The output of print(figs.print_tree(X_train, y_train)) is shown below for reference:

Glucose concentration test <= 99.500 65/192 (33.85%)
	ΔRisk = 0.07 4/59 (6.78%)
	Glucose concentration test <= 168.500 61/133 (45.86%)
		#Pregnant <= 6.500 44/112 (39.29%)
			Body mass index <= 30.850 21/76 (27.63%)
				ΔRisk = 0.06 2/31 (6.45%)
				Blood pressure(mmHg) <= 67.000 19/45 (42.22%)
					ΔRisk = 0.71 10/14 (71.43%)
					ΔRisk = 0.30 9/31 (29.03%)
			ΔRisk = 0.64 23/36 (63.89%)
		Blood pressure(mmHg) <= 93.000 17/21 (80.95%)
			ΔRisk = 0.86 17/19 (89.47%)
			ΔRisk = -0.01 0/2 (0.0%)

Also see my stripped down notebook demonstrating the issue here.

Thank you for finding this issue! You are correct these do not match and something is amiss...I would guess the issue is with the visualization code not the print_tree method.

@OmerRonen Have you seen this disagreement before?