jpmml / jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IllegalArgumentException: Out of range: 12045138254372

77QingLiu opened this issue · comments

The following error appears when trying to covert a model.txt file to pmml file

Exception in thread "main" java.lang.IllegalArgumentException: Out of range: 12045138254372
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:202)
	at com.google.common.primitives.Ints.checkedCast(Ints.java:88)
	at org.jpmml.converter.ValueUtil.asInt(ValueUtil.java:80)
	at org.jpmml.converter.ValueUtil.asInteger(ValueUtil.java:88)
	at org.jpmml.lightgbm.LightGBMUtil$2.apply(LightGBMUtil.java:332)
	at org.jpmml.lightgbm.LightGBMUtil$2.apply(LightGBMUtil.java:324)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:226)
	at org.jpmml.lightgbm.Main.run(Main.java:131)
	at org.jpmml.lightgbm.Main.main(Main.java:117)

the following is the model file
model.txt

I deleted the BRANCHID field which contains value '12045138254372', and the model file has been converted successfully.

It is assumed that all category indices fit into 32-bit (aka integer) value space. Your value - 12045138254372 - doesn't.

For a quick workaround, you could consider reindexing your categories (do you really have 12045138254372 unique category levels)? For a true fix, the JPMML-LightGBM library could switch from 32-bit indexes to 64-bit indexes.

Do you mean this category value is stored in integer in JPMML-LightGBM?
but that value is a string and only have dozens of unique category levels

but that value is a string and only have dozens of unique category levels

Currently, the value space of your BRANCHID is defined like this:

<DataField name="BRANCHID" optype="categorical" dataType="integer">
	<Value value="1"/>
	<!-- Omitted other single-digit category levels -->
	<Value value="10"/>
	<!-- Omitted other two-digit category levels -->
	<Value value="2324"/>
	<!-- Omitted other four-digit category levels -->
	<Value value="12045138254372"/>
	<Value value="12045192433901"/>
	<Value value="12977508116706"/>
</DataField>

This field has integer data type. However, the last three category values don't fit into 32-bit integer value space.

You should re-label them.

Attached is a patchfile against JPMML-LightGBM version 1.2.9 that switches the conversion-time representation of "direct category indices" from 32-bit integers to 64-bit integers.

issue_25.patch.txt

When this patchfile is applied, your model.txt file can be converted. However, I find this switch from 32-bit to 64-bit "hackish", and don't apply it to the master branch now.

Issue solved. Thanks very much for your support. But I'm wondering why this feature * BRANCHID* has * integer* datatype when in pandas dataframe is stored in categorical datatype. Shouldn't it be a character type?

But I'm wondering why this feature BRANCHID has integer datatype when in pandas dataframe is stored in categorical datatype.

The most likely explanation is that LightGBM is performing some sort of "data type detection", and since all category levels are parseable as integers, assumes that the intended data type of this column is integer (not string aka character).