OryxProject / oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Home Page:http://oryx.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

oryx 2.2.0 bug oom

flyingandrunning opened this issue · comments

hi

i found some bug ,and some information is

first of all ,my data is big. about 100000000,and some bug found while system loaded data.

ERROR speed.SpeedLayer: Error while consuming updates
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:514)
at java.lang.StringBuilder.append(StringBuilder.java:175)
at java.lang.StringBuilder.append(StringBuilder.java:76)
at com.google.common.io.CharStreams.copy(CharStreams.java:208)
at com.google.common.io.CharStreams.toStringBuilder(CharStreams.java:249)
at com.google.common.io.CharStreams.toString(CharStreams.java:223)
at com.cloudera.oryx.app.pmml.AppPMMLUtils.readPMMLFromUpdateKeyMessage(AppPMMLUtils.java:275)
at com.cloudera.oryx.app.speed.als.ALSSpeedModelManager.consumeKeyMessage(ALSSpeedModelManager.java:104)
at com.cloudera.oryx.app.speed.als.ALSSpeedModelManager.consumeKeyMessage(ALSSpeedModelManager.java:51)
at com.cloudera.oryx.api.speed.AbstractSpeedModelManager.consume(AbstractSpeedModelManager.java:48)
at com.cloudera.oryx.lambda.speed.SpeedLayer.lambda$start$1(SpeedLayer.java:126)
at com.cloudera.oryx.lambda.speed.SpeedLayer$$Lambda$26/1014698874.get(Unknown Source)
at com.cloudera.oryx.common.lang.LoggingCallable.lambda$log$0(LoggingCallable.java:48)
at com.cloudera.oryx.common.lang.LoggingCallable$$Lambda$27/351249017.call(Unknown Source)
at com.cloudera.oryx.common.lang.LoggingCallable.lambda$asRunnable$1(LoggingCallable.java:66)
at com.cloudera.oryx.common.lang.LoggingCallable$$Lambda$28/149526537.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)

Yeah, that just means you need more memory for your speed layer. It's running out when reading the PMML model file. You can control this with driver-memory in the config file.

I know your input is so large that you may need lots of memory, like 8GB. At this scale the PMML file is hundreds of megabytes, which could be challenging in other ways. But this much is just a matter of having enough memory available.