IllegalArgumentException: Out of range: 12045138254372
77QingLiu opened this issue · comments
The following error appears when trying to covert a model.txt file to pmml file
Exception in thread "main" java.lang.IllegalArgumentException: Out of range: 12045138254372
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:202)
at com.google.common.primitives.Ints.checkedCast(Ints.java:88)
at org.jpmml.converter.ValueUtil.asInt(ValueUtil.java:80)
at org.jpmml.converter.ValueUtil.asInteger(ValueUtil.java:88)
at org.jpmml.lightgbm.LightGBMUtil$2.apply(LightGBMUtil.java:332)
at org.jpmml.lightgbm.LightGBMUtil$2.apply(LightGBMUtil.java:324)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:226)
at org.jpmml.lightgbm.Main.run(Main.java:131)
at org.jpmml.lightgbm.Main.main(Main.java:117)
the following is the model file
model.txt
I deleted the BRANCHID
field which contains value '12045138254372', and the model file has been converted successfully.
It is assumed that all category indices fit into 32-bit (aka integer) value space. Your value - 12045138254372 - doesn't.
For a quick workaround, you could consider reindexing your categories (do you really have 12045138254372 unique category levels)? For a true fix, the JPMML-LightGBM library could switch from 32-bit indexes to 64-bit indexes.
Do you mean this category value is stored in integer in JPMML-LightGBM?
but that value is a string and only have dozens of unique category levels
but that value is a string and only have dozens of unique category levels
Currently, the value space of your BRANCHID
is defined like this:
<DataField name="BRANCHID" optype="categorical" dataType="integer">
<Value value="1"/>
<!-- Omitted other single-digit category levels -->
<Value value="10"/>
<!-- Omitted other two-digit category levels -->
<Value value="2324"/>
<!-- Omitted other four-digit category levels -->
<Value value="12045138254372"/>
<Value value="12045192433901"/>
<Value value="12977508116706"/>
</DataField>
This field has integer
data type. However, the last three category values don't fit into 32-bit integer value space.
You should re-label them.
Attached is a patchfile against JPMML-LightGBM version 1.2.9 that switches the conversion-time representation of "direct category indices" from 32-bit integers to 64-bit integers.
When this patchfile is applied, your model.txt
file can be converted. However, I find this switch from 32-bit to 64-bit "hackish", and don't apply it to the master branch now.
Issue solved. Thanks very much for your support. But I'm wondering why this feature * BRANCHID* has * integer* datatype when in pandas dataframe is stored in categorical datatype. Shouldn't it be a character type?
But I'm wondering why this feature
BRANCHID
hasinteger
datatype when in pandas dataframe is stored in categorical datatype.
The most likely explanation is that LightGBM is performing some sort of "data type detection", and since all category levels are parseable as integers, assumes that the intended data type of this column is integer (not string aka character).