OryxProject / oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Home Page:http://oryx.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update to Kafka 0.10, new consumer/producer APIs

srowen opened this issue · comments

Kafka 0.9 has new producer/consumer APIs in 0.9. Branches that require 0.9 should use them.

Correction; this should probably only happen for Kafka 0.10 and when that's required.

WIP; not to be merged anytime real soon: https://github.com/srowen/oryx/commits/Kafka010

I'm attempting to run the word count example on this branch (Kafka010) like this:

./deploy/bin/oryx-run.sh batch \
--layer-jar ./deploy/oryx-batch/target/oryx-batch-2.3.0-SNAPSHOT.jar \
--conf ./app/conf/wordcount-example.conf \
--app-jar ./app/example/target/example-2.3.0-SNAPSHOT.jar

This was built with mvn -DskipTests package

But I receive the following trace back:

Exception in thread "main" java.lang.NoClassDefFoundError: kafka/admin/RackAwareMode
        at com.cloudera.oryx.lambda.AbstractSparkLayer.buildInputDStream(AbstractSparkLayer.java:179)
        at com.cloudera.oryx.lambda.batch.BatchLayer.start(BatchLayer.java:105)
        at com.cloudera.oryx.batch.Main.main(Main.java:33)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.admin.RackAwareMode
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 12 more

Any suggestions on what I'm doing wrong?

My issue here was that the classpath computed by deploy/bin/compute-classpath.sh was wrong. Rather than taking the approach described in #265, I hacked the file for my install situation. Once everything is up and running, I plan to follow the approach outlined #265

Resolved by 7efb272

I'm attempting to run the als example on this tag(2.4.1) like this:
./deploy/bin/oryx-run.sh batch
and get exception : "Caused by: java.lang.ClassNotFoundException: kafka.admin.RackAwareMode"

This was built with mvn -DskipTests package.

hadoop version is 2.7.3
spark version is 2.1.1
kafka version is 0.10.2 (with scala 2.11.8)

17/06/22 20:42:48 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
17/06/22 20:42:48 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
17/06/22 20:42:48 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/06/22 20:42:48 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
17/06/22 20:42:48 INFO cluster.YarnClientSchedulerBackend: Stopped
17/06/22 20:42:48 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/06/22 20:42:48 INFO memory.MemoryStore: MemoryStore cleared
17/06/22 20:42:48 INFO storage.BlockManager: BlockManager stopped
17/06/22 20:42:48 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/06/22 20:42:48 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/06/22 20:42:48 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.NoClassDefFoundError: kafka/admin/RackAwareMode
          at com.cloudera.oryx.lambda.AbstractSparkLayer.buildInputDStream(AbstractSparkLayer.java:179)
          at com.cloudera.oryx.lambda.batch.BatchLayer.start(BatchLayer.java:105)
          at com.cloudera.oryx.batch.Main.main(Main.java:33)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
          at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
          at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
          at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
          at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.admin.RackAwareMode
          at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 12 more
17/06/22 20:42:48 INFO util.ShutdownHookManager: Shutdown hook called
17/06/22 20:42:48 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-927c3058-5cf2-4ddd-9071-236b3256242d

Any suggestions on what I'm doing wrong? Thank you.

@MichaelXucf (By the way this would be better on the mailing list: https://groups.google.com/a/cloudera.org/forum/#!forum/oryx-user )

Because this class was added in 0.10.0.0 (https://issues.apache.org/jira/browse/KAFKA-1215) that leads me to believe that you aren't actually running vs Kafka 0.10 classes at runtime. What's your environment like?

@srowen
my os version is centeros 6.5
softeware version as follows:
apache hadoop 2.7.3 (installed in user hadoop)
apache spark 2.1.1 (installed in user hadoop)
apache kafka 0.10.2.0 (installed in user kafka)
oryx 2.4.1 (installed in user oryx)

in user oryx's .bash_profile, i have set as follows:

PATH=$PATH:$HOME/bin
export KAFKA_HOME=/opt/kafka
export PATH=$KAFKA_HOME/bin:$PATH
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

export PATH

the oryx is installed in /home/oryx/software/oryx-2.4.1 as follow:

$ ll -th
-rwxr--r-- 1 oryx oryx 1.9K 6月  22 20:36 compute-classpath.sh
-rw-r--r-- 1 oryx oryx 1.9K 6月  22 11:29 oryx.conf
-rwxr--r-- 1 oryx oryx  14K 6月  21 15:43 oryx-run.sh
-rw-r--r-- 1 oryx oryx  30M 6月  16 16:02 oryx-serving-2.4.1.jar
-rw-r--r-- 1 oryx oryx  27M 6月  16 16:02 oryx-speed-2.4.1.jar
-rw-r--r-- 1 oryx oryx  27M 6月  16 16:02 oryx-batch-2.4.1.jar
-rw-r--r-- 1 oryx oryx 1.9K 3月  29 16:59 als-example.conf

then i run the command :

cd software/oryx-2.4.1
./oryx-run.sh batch

then i get the exception like this:

17/06/22 20:42:48 INFO spark.SparkContext: Successfully stopped SparkContext Exception in thread "main" 
java.lang.NoClassDefFoundError: kafka/admin/RackAwareMode 
    at com.cloudera.oryx.lambda.AbstractSparkLayer.buildInputDStream(AbstractSparkLayer.java:179) 
    at com.cloudera.oryx.lambda.batch.BatchLayer.start(BatchLayer.java:105) 
    at com.cloudera.oryx.batch.Main.main(Main.java:33) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ClassNotFoundException: kafka.admin.RackAwareMode 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more 
17/06/22 20:42:48 INFO util.ShutdownHookManager: Shutdown hook called 
17/06/22 20:42:48 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-927c3058-5cf2-4ddd-9071-236b3256242d

Where shold i tell oryx the location of the kafka_2.11-0.10.2.0.jar ?
the kafka_2.11-0.10.2.0.jar is under /opt/kafka/libs.
I view the pom.xml of oryx-lambda . the spark-streaming-kafka dependency scope is provided,

<dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-streaming-kafka-0-10_${scala.minor.version}</artifactId>
       <version>${spark.version}</version>
       <scope>provided</scope>
 </dependency>

default

@srowen
i put the dependent jars kafka_2.11-0.10.2.0.jar, zkclient-0.10.jar in directory oryx-2.4.1/libs .
and then execute the follow command instead of run "oryx-run.sh batch" :

spark-submit \
--master yarn \
--deploy-mode client \
--name OryxBatchLayer-ALSExample \
--class com.cloudera.oryx.batch.Main \
--files oryx.conf \
--driver-memory 1g \
--driver-java-options "-Dconfig.file=oryx.conf" \
--executor-memory 4g \
--executor-cores 8 \
--conf spark.executor.extraJavaOptions="-Dconfig.file=oryx.conf" \
--conf spark.ui.port=4040 \
--conf spark.io.compression.codec=lzf \
--conf spark.logConf=true \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.speculation=true \
--conf spark.ui.showConsoleProgress=false \
--conf spark.ui.showConsoleProgress=false \
--conf spark.ui.showConsoleProgress=false \
--num-executors=4 \
--jars libs/kafka_2.11-0.10.2.0.jar,libs/zkclient-0.10.jar \
oryx-batch-2.4.1.jar

it slove my exception " java.lang.NoClassDefFoundError: kafka/admin/RackAwareMode " ,
but still have other NoClassDefFoundError exception, like "Caused by: java.lang.ClassNotFoundException: com.yammer.metrics.Metrics".

so the problem is that oryx-batch-2.4.1.jar doesn't include jars it dependent .

(Please, this would be better on the mailing list. I will answer if you summarize the issue there.)

i met the same error,how did you solve the problem? thank you!!!! @MichaelXucf

i met the same error,how did you solve the problem? thank you

i met the same error,how did you solve the problem? thank you

same error

Please try the build linked in the issue I pointed out, to see if it works. #345