big-data-europe / docker-spark

Apache Spark docker image

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

java.lang.NumberFormatException: For input string: "tcp://10.153.36.170:8080"

geekyouth opened this issue · comments

image

2021-06-26 00:57:39.41 UTCspark-master-fdb5b47df-p85lhspark-masterUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
2021-06-26 00:57:39.48 UTCspark-master-fdb5b47df-p85lhspark-master21/06/26 00:57:39 INFO Master: Started daemon with process name: 9@spark-master-fdb5b47df-p85lh
2021-06-26 00:57:39.58 UTCspark-master-fdb5b47df-p85lhspark-master21/06/26 00:57:39 INFO SignalUtils: Registering signal handler for TERM
2021-06-26 00:57:39.58 UTCspark-master-fdb5b47df-p85lhspark-master21/06/26 00:57:39 INFO SignalUtils: Registering signal handler for HUP
2021-06-26 00:57:39.58 UTCspark-master-fdb5b47df-p85lhspark-master21/06/26 00:57:39 INFO SignalUtils: Registering signal handler for INT
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master21/06/26 00:57:40 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-masterjava.lang.NumberFormatException: For input string: "tcp://10.153.36.170:8080"
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at java.lang.Integer.parseInt(Integer.java:580)
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at java.lang.Integer.parseInt(Integer.java:615)
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at org.apache.spark.deploy.master.MasterArguments.<init>(MasterArguments.scala:46)
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at org.apache.spark.deploy.master.Master$.main(Master.scala:1208)
2021-06-26 00:57:40.03 UTCspark-master-fdb5b47df-p85lhspark-master at org.apache.spark.deploy.master.Master.main(Master.scala)
2021-06-26 00:57:40.08 UTCspark-master-fdb5b47df-p85lhspark-master21/06/26 00:57:40 INFO ShutdownHookManager: Shutdown hook called

yaml: https://raw.githubusercontent.com/big-data-europe/docker-spark/2.4.3-hadoop2.7/docker-compose.yml

spark-master:
  image: bde2020/spark-master:2.4.3-hadoop2.7
  container_name: spark-master
  ports:
    - "8080:8080"
    - "7077:7077"
  environment:
    - INIT_DAEMON_STEP=setup_spark
    - "constraint:node==<yourmasternode>"
spark-worker-1:
  image: bde2020/spark-worker:2.4.3-hadoop2.7
  container_name: spark-worker-1
  depends_on:
    - spark-master
  ports:
    - "8081:8081"
  environment:
    - "SPARK_MASTER=spark://spark-master:7077"
    - "constraint:node==<yourworkernode>"
spark-worker-2:
  image: bde2020/spark-worker:2.4.3-hadoop2.7
  container_name: spark-worker-2
  depends_on:
    - spark-master
  ports:
    - "8081:8081"
  environment:
    - "SPARK_MASTER=spark://spark-master:7077"
    - "constraint:node==<yourworkernode>"  

I am having the exact same problem on deploy, have any tip?

For me, I ran into this problem when I was putting these spark images on kubernetes. This article explains what the problem is: https://medium.com/@varunreddydaaram/kubernetes-did-not-work-with-apache-spark-de923ae7ab5c.

TLDR: kubernetes autohatches environmental variables that tromp on Spark's values and this confuses Spark. For kubernetes never name your spark master as spark-master. Name it something else (for instance, to workaround this I temporarily named mine as death-star and things started working).

Minor unrelated comment: why do you have two workers listening on 8081? In my setups I put worker 1 on 8081 and worker 2 on 8082 etc. Maybe my approach is wrong but it's been working well for me.

For me, I ran into this problem when I was putting these spark images on kubernetes. This article explains what the problem is: https://medium.com/@varunreddydaaram/kubernetes-did-not-work-with-apache-spark-de923ae7ab5c.

TLDR: kubernetes autohatches environmental variables that tromp on Spark's values and this confuses Spark. For kubernetes never name your spark master as spark-master. Name it something else (for instance, to workaround this I temporarily named mine as death-star and things started working).

UnF****ing believable, this fixed it. Cheers! I can go to bed now. Note the same applies to setting workers to spark-worker started getting the same errors on my workers then it was resolved when i moved to spark-workerbee

For the next poor chap, this was the output of my spark-master error:

23/07/22 05:05:56 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.lang.NumberFormatException: For input string: "tcp://10.105.207.146:7077"
	at java.base/java.lang.NumberFormatException.forInputString(Unknown Source)
	at java.base/java.lang.Integer.parseInt(Unknown Source)
	at java.base/java.lang.Integer.parseInt(Unknown Source)
	at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
	at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
	at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
	at org.apache.spark.deploy.master.MasterArguments.<init>(MasterArguments.scala:46)
	at org.apache.spark.deploy.master.Master$.main(Master.scala:1228)
	at org.apache.spark.deploy.master.Master.main(Master.scala)
23/07/22 05:05:56 INFO ShutdownHookManager: Shutdown hook called

From the blog https://medium.com/@varunreddydaaram/kubernetes-did-not-work-with-apache-spark-de923ae7ab5c. linked by @bdezonia

So, for our service spark-master, kubernetes would generate an env varibale called SPARK_MASTER_PORT=tcp://100.68.168.187:8080, but in turn SPARK_MASTER_PORT was an internal variable for APACHE SPARK!
It worked for service spark-hdfs, because kubernetes would generate an env variable called
SPARK_HDFS_PORT=tcp://100.68.168.187:8080, which,… is not referenced by APACHE SPARK, so it worked!