Error? spark-submit didn't work!

Question

Error? spark-submit didn't work!

honghaolin opened this issue 3 years ago · comments

Hi,

Thanks for providing those awesome docker images. It is very helpful! I am trying to follow the examples to setup docker-compose.yaml, but it seems it didn't work.

bash-5.0# ./submit.sh 
Submit application /app/entrypoint.py to Spark master spark://spark-master:7077
Passing arguments 
21/02/16 17:40:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/02/16 17:40:55 INFO SparkContext: Running Spark version 2.4.5
21/02/16 17:40:55 INFO SparkContext: Submitted application: testing
21/02/16 17:40:55 INFO SecurityManager: Changing view acls to: root
21/02/16 17:40:55 INFO SecurityManager: Changing modify acls to: root
21/02/16 17:40:55 INFO SecurityManager: Changing view acls groups to: 
21/02/16 17:40:55 INFO SecurityManager: Changing modify acls groups to: 
21/02/16 17:40:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/02/16 17:40:56 INFO Utils: Successfully started service 'sparkDriver' on port 46471.
21/02/16 17:40:56 INFO SparkEnv: Registering MapOutputTracker
21/02/16 17:40:56 INFO SparkEnv: Registering BlockManagerMaster
21/02/16 17:40:56 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/02/16 17:40:56 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/02/16 17:40:56 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b86a6147-f336-47c1-91b4-f2cdea03bf81
21/02/16 17:40:56 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/02/16 17:40:56 INFO SparkEnv: Registering OutputCommitCoordinator
21/02/16 17:40:56 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/02/16 17:40:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-submit:4040
21/02/16 17:40:57 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
21/02/16 17:40:57 INFO TransportClientFactory: Successfully created connection to spark-master/172.30.0.2:7077 after 91 ms (0 ms spent in bootstraps)
21/02/16 17:40:57 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20210216174057-0000
21/02/16 17:40:57 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37859.
21/02/16 17:40:57 INFO NettyBlockTransferService: Server created on spark-submit:37859
21/02/16 17:40:57 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/02/16 17:40:57 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-submit, 37859, None)
21/02/16 17:40:57 INFO BlockManagerMasterEndpoint: Registering block manager spark-submit:37859 with 366.3 MB RAM, BlockManagerId(driver, spark-submit, 37859, None)
21/02/16 17:40:57 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-submit, 37859, None)
21/02/16 17:40:57 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-submit, 37859, None)
21/02/16 17:40:58 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
21/02/16 17:40:58 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/spark-warehouse').
21/02/16 17:40:58 INFO SharedState: Warehouse path is 'file:/spark-warehouse'.
21/02/16 17:40:59 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint

I have setup a spark-master, a spark-worker, and a spark-submit. I use command: tail -F anything to keep spark-submit running, and go into it to run submit.sh. The above is the log I get, and it is keep running, but I would expected to get some prints.

I use control + c to stop it after 5 minutes, and here is the traceback log:

Traceback (most recent call last):
  File "/app/entrypoint.py", line 29, in <module>
    main()
  File "/app/entrypoint.py", line 25, in main
    print(nums.map(lambda x: x * x).collect())
  File "/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 816, in collect
  File "/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1255, in __call__
  File "/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
  File "/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1152, in send_command
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/spark/python/lib/pyspark.zip/pyspark/context.py", line 270, in signal_handler
KeyboardInterrupt

Here is my code in entrypoint.py:

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession


def init_spark():
    conf = SparkConf().setAppName("testing")
    conf.setAll(
        {
            "spark.cores.max": "2",
            "spark.driver.memory": "4g",
            "spark.executor.cores": "2",
            "spark.executor.memory": "4g",
            "spark.sql.shuffle.partitions": "2",
        }.items()
    )

    spark = SparkSession.builder.config(conf=conf).getOrCreate()
    spark.sparkContext.setLogLevel("ERROR")
    return spark


def main():
    spark = init_spark()
    nums = spark.sparkContext.parallelize([1, 2, 3, 4])
    print(nums.map(lambda x: x * x).collect())


if __name__ == "__main__":
    main()

Here is my docker-compose.yaml:

version: "3.0"
services:

  spark-master:
    image: bde2020/spark-master:2.4.5-hadoop2.7
    hostname: spark-master
    container_name: spark-master
    ports:
      - 8080:8080
      - 7077:7077
    environment:
      INIT_DAEMON_STEP: setup_spark

  spark-worker:
    image: bde2020/spark-master:2.4.5-hadoop2.7
    hostname: spark-worker
    container_name: spark-worker
    depends_on:
      - spark-master
    ports:
      - 8081:8081
    environment:
      SPARK_MASTER: spark://spark-master:7077

  spark-submit:
    build: ./streaming
    hostname: spark-submit
    container_name: spark-submit
    depends_on:
      - spark-master
      - spark-worker
    environment:
      SPARK_MASTER_NAME: spark-master
      SPARK_MASTER_PORT: 7077
      ENABLE_INIT_DAEMON: "false"
    command: tail -F anything

I wonder if someone can give me some idea how to solve it? I must be missing something here.

Thanks!

Gezim Sejdiu · Answer 1 · Tue Mar 23 2021 06:32:04 GMT+0800 (China Standard Time)

Hi @honghaolin ,

thanks a lot for your feedback and sending us this issue.

I tried to reproduce it and didn't get the exact error you are getting.

You mentioned that you tried with docker-compose and it didn't work? I myself did setup it via docker-compose (of course I removed the tail -F anything command just that it doesn't fail to tail any file which doesn't exists but instead print-out the results).

I reused your example and got this result:

➜  docker-spark git:(master) ✗ docker logs spark-submit
Submit application /app/entrypoint.py to Spark master spark://spark-master:7077
Passing arguments 
21/03/22 22:24:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/22 22:24:08 INFO SparkContext: Running Spark version 3.1.1
21/03/22 22:24:08 INFO ResourceUtils: ==============================================================
21/03/22 22:24:08 INFO ResourceUtils: No custom resources configured for spark.driver.
21/03/22 22:24:08 INFO ResourceUtils: ==============================================================
21/03/22 22:24:08 INFO SparkContext: Submitted application: testing
21/03/22 22:24:08 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 2, script: , vendor: , memory -> name: memory, amount: 4096, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
21/03/22 22:24:08 INFO ResourceProfile: Limiting resource is cpus at 2 tasks per executor
21/03/22 22:24:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
21/03/22 22:24:08 INFO SecurityManager: Changing view acls to: root
21/03/22 22:24:08 INFO SecurityManager: Changing modify acls to: root
21/03/22 22:24:08 INFO SecurityManager: Changing view acls groups to: 
21/03/22 22:24:08 INFO SecurityManager: Changing modify acls groups to: 
21/03/22 22:24:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/03/22 22:24:09 INFO Utils: Successfully started service 'sparkDriver' on port 46787.
21/03/22 22:24:09 INFO SparkEnv: Registering MapOutputTracker
21/03/22 22:24:09 INFO SparkEnv: Registering BlockManagerMaster
21/03/22 22:24:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/03/22 22:24:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/03/22 22:24:09 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/03/22 22:24:09 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e71f1d26-8abd-407c-81f5-02ae2bc8008e
21/03/22 22:24:09 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
21/03/22 22:24:09 INFO SparkEnv: Registering OutputCommitCoordinator
21/03/22 22:24:09 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/03/22 22:24:09 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-submit:4040
21/03/22 22:24:10 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
21/03/22 22:24:10 INFO TransportClientFactory: Successfully created connection to spark-master/172.19.0.2:7077 after 43 ms (0 ms spent in bootstraps)
21/03/22 22:24:10 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20210322222410-0000
21/03/22 22:24:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43373.
21/03/22 22:24:10 INFO NettyBlockTransferService: Server created on spark-submit:43373
21/03/22 22:24:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/22 22:24:10 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-submit, 43373, None)
21/03/22 22:24:10 INFO BlockManagerMasterEndpoint: Registering block manager spark-submit:43373 with 366.3 MiB RAM, BlockManagerId(driver, spark-submit, 43373, None)
21/03/22 22:24:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-submit, 43373, None)
21/03/22 22:24:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-submit, 43373, None)
21/03/22 22:24:10 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20210322222410-0000/0 on worker-20210322222407-172.19.0.4-37429 (172.19.0.4:37429) with 2 core(s)
21/03/22 22:24:10 INFO StandaloneSchedulerBackend: Granted executor ID app-20210322222410-0000/0 on hostPort 172.19.0.4:37429 with 2 core(s), 4.0 GiB RAM
21/03/22 22:24:10 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20210322222410-0000/0 is now RUNNING
21/03/22 22:24:10 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
21/03/22 22:24:11 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/spark-warehouse').
21/03/22 22:24:11 INFO SharedState: Warehouse path is 'file:/spark-warehouse'.
[1, 4, 9, 16]

And the results are correct [1, 4, 9, 16] as you apply x*x on the map function.

I did use the latest version of Spark docker images bde2020/spark-python-template:3.1.1-hadoop3.2 to bundle the spark-submit example on Docker.

Feel free to comment in case you are still facing the same issue.

Best regards,

Gezim Sejdiu · Answer 2 · Sun Mar 28 2021 05:20:21 GMT+0800 (China Standard Time)

Hey @honghaolin ,

I'm closing this one for now but feel free to post in case you are still facing any issue with that.

Best regards,