gettyimages / docker-spark

Docker build for Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SSH question?

alexleethinker opened this issue · comments

Hello,
Thanks for sharing this great work. It worked very well on my machine. But as I am a new learner of Docker and Hadoop/Spark. I got a confusing question about ssh when reading the Dockerfile.

Traditionally, when setting up a multi-node cluster we will need to set up ssh and hosts files in master/slave hosts, to enable communications between hosts. But in the Dockerfile I didn't find anything related to ssh. Even no ssh service is installed.

So I am really wondering how the master node is controlling slave nodes without ssh?

Sorry to disturb if you think this question is stupid, as I am new in this field.

Alex

Spark doesn't need SSH. It communicates over RPC calls to the schduler between drivers & workers

If you would like to ask Spark specific questions, join their mailing lists.