run demo of sona latest version bug
lcx517 opened this issue · comments
Hi, I'm running SONA-example,and got FAILED with stdout log here.
PLEASE HELP~~
2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for TERM
2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for HUP
2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for INT
2019-12-26 14:09:19 INFO SecurityManager:54 - Changing view acls to: deepthought
2019-12-26 14:09:19 INFO SecurityManager:54 - Changing modify acls to: deepthought
2019-12-26 14:09:19 INFO SecurityManager:54 - Changing view acls groups to:
2019-12-26 14:09:19 INFO SecurityManager:54 - Changing modify acls groups to:
2019-12-26 14:09:19 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(deepthought); groups with view permissions: Set(); users with modify permissions: Set(deepthought); groups with modify permissions: Set()
2019-12-26 14:09:20 INFO UserGroupInformation:964 - Login successful for user deepthought using keytab file deepthought.keytab-4169bc48-f895-42c2-9dde-091feb49f3c5
2019-12-26 14:09:20 INFO ApplicationMaster:54 - Preparing Local resources
2019-12-26 14:09:22 WARN Client:677 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2019-12-26 14:09:28 INFO ApplicationMaster:54 - ApplicationAttemptId: appattempt_1576380960005_2467808_000001
2019-12-26 14:09:28 INFO AMCredentialRenewer:54 - Scheduling login from keytab in 64776907 millis.
2019-12-26 14:09:28 INFO ApplicationMaster:54 - Starting the user application in a separate Thread
2019-12-26 14:09:28 ERROR ApplicationMaster:91 - Uncaught exception:
java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:715)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:491)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
2019-12-26 14:09:28 INFO ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples)
2019-12-26 14:09:28 INFO ShutdownHookManager:54 - Shutdown hook called
my SONA-example script:
source ./spark-on-angel-env.sh
export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster \
--driver-java-options "-Djava.library.path=/usr/lib/hadoop/lib/native" \
--keytab /home/deepthought/deepthought.keytab \
--principal deepthought \
--queue longyuan.p0 \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.instances=10 \
--conf spark.ps.cores=2 \
--conf spark.ps.memory=6g \
--jars $SONA_SPARK_JARS\
--name "LR-spark-on-angel" \
--files /data/angel/sona-0.1.0-bin/jsons/logreg.json \
--driver-memory 10g \
--num-executors 10 \
--executor-cores 2 \
--executor-memory 4g \
--class org.apache.spark.angel.examples.JsonRunnerExamples \
./../lib/angelml-${SONA_VERSION}.jar \
data:viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin/data/angel/a9a/a9a_123d_train.libsvm \
modelPath:viewfs://hadoop-bd/user/deepthought/test/output \
jsonFile:./lr.json \
lr:0.1
and my spark-on-angel-env.sh:
export JAVA_HOME=/usr
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.6
export SONA_HOME=/data/angel/sona-0.1.0-bin
export SONA_HDFS_HOME=viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin
export SONA_VERSION=0.1.0
export ANGEL_VERSION=3.0.1
export ANGEL_UTILS_VERSION=0.1.1
export ANGEL_MLCORE_VERSION=0.1.2
...<not changed default content below>...```
Hi, I'm running SONA-example,and got FAILED with stdout log here.
PLEASE HELP~~2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for TERM 2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for HUP 2019-12-26 14:09:19 INFO SignalUtils:54 - Registered signal handler for INT 2019-12-26 14:09:19 INFO SecurityManager:54 - Changing view acls to: deepthought 2019-12-26 14:09:19 INFO SecurityManager:54 - Changing modify acls to: deepthought 2019-12-26 14:09:19 INFO SecurityManager:54 - Changing view acls groups to: 2019-12-26 14:09:19 INFO SecurityManager:54 - Changing modify acls groups to: 2019-12-26 14:09:19 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(deepthought); groups with view permissions: Set(); users with modify permissions: Set(deepthought); groups with modify permissions: Set() 2019-12-26 14:09:20 INFO UserGroupInformation:964 - Login successful for user deepthought using keytab file deepthought.keytab-4169bc48-f895-42c2-9dde-091feb49f3c5 2019-12-26 14:09:20 INFO ApplicationMaster:54 - Preparing Local resources 2019-12-26 14:09:22 WARN Client:677 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error 2019-12-26 14:09:28 INFO ApplicationMaster:54 - ApplicationAttemptId: appattempt_1576380960005_2467808_000001 2019-12-26 14:09:28 INFO AMCredentialRenewer:54 - Scheduling login from keytab in 64776907 millis. 2019-12-26 14:09:28 INFO ApplicationMaster:54 - Starting the user application in a separate Thread 2019-12-26 14:09:28 ERROR ApplicationMaster:91 - Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:715) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:491) at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 2019-12-26 14:09:28 INFO ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples) 2019-12-26 14:09:28 INFO ShutdownHookManager:54 - Shutdown hook called
my SONA-example script:
source ./spark-on-angel-env.sh export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop $SPARK_HOME/bin/spark-submit \ --master yarn-cluster \ --driver-java-options "-Djava.library.path=/usr/lib/hadoop/lib/native" \ --keytab /home/deepthought/deepthought.keytab \ --principal deepthought \ --queue longyuan.p0 \ --conf spark.ps.jars=$SONA_ANGEL_JARS \ --conf spark.ps.instances=10 \ --conf spark.ps.cores=2 \ --conf spark.ps.memory=6g \ --jars $SONA_SPARK_JARS\ --name "LR-spark-on-angel" \ --files /data/angel/sona-0.1.0-bin/jsons/logreg.json \ --driver-memory 10g \ --num-executors 10 \ --executor-cores 2 \ --executor-memory 4g \ --class org.apache.spark.angel.examples.JsonRunnerExamples \ ./../lib/angelml-${SONA_VERSION}.jar \ data:viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin/data/angel/a9a/a9a_123d_train.libsvm \ modelPath:viewfs://hadoop-bd/user/deepthought/test/output \ jsonFile:./lr.json \ lr:0.1
and my spark-on-angel-env.sh:
export JAVA_HOME=/usr export HADOOP_HOME=/usr/lib/hadoop export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.6 export SONA_HOME=/data/angel/sona-0.1.0-bin export SONA_HDFS_HOME=viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin export SONA_VERSION=0.1.0 export ANGEL_VERSION=3.0.1 export ANGEL_UTILS_VERSION=0.1.1 export ANGEL_MLCORE_VERSION=0.1.2 ...<not changed default content below>...```
class changed aleady, while doc is outdated!
You need to change "org.apache.spark.angel.examples.JsonRunnerExamples" to "com.tencent.angel.sona.examples.JsonRunnerExamples".
luck~