Error converting mode output txt to PMML
TGalaxy opened this issue · comments
Got the following error when converting txt to PMML
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 122, Size: 1
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
at org.jpmml.lightgbm.Main.run(Main.java:132)
at org.jpmml.lightgbm.Main.main(Main.java:118)
For security reason I couldn't attach the model txt file. But could you explain what the error means? Trying to see if I can give you a toy example
But could you explain what the error means?
It means that your LightGBM model text file is internally inconsistent - there is a hint that some attribute should contain at least 123 elements, but the parser only finds a single element.
As the exception happens during schema parsing, then I believe there's something wrong with the specification of categorical columns.
For security reason I couldn't attach the model txt file
Then you need to debug this issue locally.
Trying to see if I can give you a toy example
Keeping this issue open for a couple of days. If I don't see a reproducible example during that timeframe, then I'll close it as "invalid".
As the exception happens during schema parsing, then I believe there's something wrong with the specification of categorical columns.
One shouldn't be working with LightGBM model text files directly.
I believe this exception would be avoided if you interacted with LightGBM using some high-level framework such as Scikit-Learn, which takes care of feature engineering and specification needs.
See https://openscoring.io/blog/2019/04/07/converting_sklearn_lightgbm_pipeline_pmml/
Thanks for your reply. I was trying to created a toy example, i.e., selected a few features from the original data including the categorical feature. It works smoothly. However it still does not work with all the features.
Here is my code:
d_train = lgb.Dataset(train[feature_list], label=train.tag,categorical_feature=categorical_feature)
d_validation = lgb.Dataset(validation[feature_list],label=validation.tag,categorical_feature=categorical_feature)
model = lgb.train(params, d_train, valid_sets=d_validation, early_stopping_rounds=50, verbose_eval=100)
model.save_model('lgbm.txt', num_iteration=model.best_iteration)
I will force all the other features (other than categorical) to be float and run it again.
I forced categorical features to be type category and others to be float64. However, I still got the same error
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 119, Size: 69
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
at org.jpmml.lightgbm.Main.run(Main.java:132)
at org.jpmml.lightgbm.Main.main(Main.java:118)
Your feature specification code is wrong. However, it's impossible for me to be any specific, because the posted exception stack trace(s) do not contain enough actionable information.
Closing as invalid/not reproducible.
@vruusmann Hello, I also encounter this problem. And I count the pandas_categorical number is right, but when convert, it also out of bounds.where could the redundant number from?