Error converting mode output txt to PMML

Question

Error converting mode output txt to PMML

TGalaxy opened this issue 5 years ago · comments

Got the following error when converting txt to PMML

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 122, Size: 1
	at java.util.ArrayList.rangeCheck(Unknown Source)
	at java.util.ArrayList.get(Unknown Source)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:132)
	at org.jpmml.lightgbm.Main.main(Main.java:118)

For security reason I couldn't attach the model txt file. But could you explain what the error means? Trying to see if I can give you a toy example

Villu Ruusmann · Answer 1 · Wed Nov 13 2019 15:44:16 GMT+0800 (China Standard Time)

But could you explain what the error means?

It means that your LightGBM model text file is internally inconsistent - there is a hint that some attribute should contain at least 123 elements, but the parser only finds a single element.

As the exception happens during schema parsing, then I believe there's something wrong with the specification of categorical columns.

For security reason I couldn't attach the model txt file

Then you need to debug this issue locally.

Trying to see if I can give you a toy example

Keeping this issue open for a couple of days. If I don't see a reproducible example during that timeframe, then I'll close it as "invalid".

Villu Ruusmann · Answer 2 · Wed Nov 13 2019 15:47:45 GMT+0800 (China Standard Time)

As the exception happens during schema parsing, then I believe there's something wrong with the specification of categorical columns.

One shouldn't be working with LightGBM model text files directly.

I believe this exception would be avoided if you interacted with LightGBM using some high-level framework such as Scikit-Learn, which takes care of feature engineering and specification needs.

See https://openscoring.io/blog/2019/04/07/converting_sklearn_lightgbm_pipeline_pmml/

TGalaxy · Answer 3 · Fri Nov 15 2019 09:41:58 GMT+0800 (China Standard Time)

Thanks for your reply. I was trying to created a toy example, i.e., selected a few features from the original data including the categorical feature. It works smoothly. However it still does not work with all the features.

Here is my code:

d_train = lgb.Dataset(train[feature_list], label=train.tag,categorical_feature=categorical_feature)
d_validation = lgb.Dataset(validation[feature_list],label=validation.tag,categorical_feature=categorical_feature)

model = lgb.train(params, d_train, valid_sets=d_validation, early_stopping_rounds=50, verbose_eval=100)
model.save_model('lgbm.txt', num_iteration=model.best_iteration)

I will force all the other features (other than categorical) to be float and run it again.

TGalaxy · Answer 4 · Wed Nov 20 2019 08:32:14 GMT+0800 (China Standard Time)

I forced categorical features to be type category and others to be float64. However, I still got the same error

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 119, Size: 69
	at java.util.ArrayList.rangeCheck(Unknown Source)
	at java.util.ArrayList.get(Unknown Source)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:132)
	at org.jpmml.lightgbm.Main.main(Main.java:118)

Villu Ruusmann · Answer 5 · Wed Nov 20 2019 14:08:51 GMT+0800 (China Standard Time)

Your feature specification code is wrong. However, it's impossible for me to be any specific, because the posted exception stack trace(s) do not contain enough actionable information.

Closing as invalid/not reproducible.

etVERITAS · Answer 6 · Tue Sep 01 2020 17:55:59 GMT+0800 (China Standard Time)

@vruusmann Hello, I also encounter this problem. And I count the pandas_categorical number is right, but when convert, it also out of bounds.where could the redundant number from?