dmlc / rabit

Reliable Allreduce and Broadcast Interface for distributed machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rabit - failure no network connection

geoHeil opened this issue · comments

xgboost4j / rabit seems to fail to start the tracker in case there are no network connections available.

I have to admit that I tried to run xgboost4j in spark local mode on my laptop whilst it was not connected to a WIFI. Still a strange error, I thought it would work in local only mode as well.

tracker started, with env={}
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger: 2016-11-11 19:16:00,643 WARNING gethostbyname(socket.getfqdn()) failed... trying on hostname()
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger: Traceback (most recent call last):
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:   File "/var/folders/zm/bts0g37j0l9637sjxb3b700h0000gn/T/tracker1300409870996753788.py", line 475, in <module>
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:     main()
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:   File "/var/folders/zm/bts0g37j0l9637sjxb3b700h0000gn/T/tracker1300409870996753788.py", line 470, in main
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:     start_rabit_tracker(args)
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:   File "/var/folders/zm/bts0g37j0l9637sjxb3b700h0000gn/T/tracker1300409870996753788.py", line 432, in start_rabit_tracker
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:     rabit = RabitTracker(hostIP=get_host_ip(args.host_ip), nslave=args.num_workers)
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:   File "/var/folders/zm/bts0g37j0l9637sjxb3b700h0000gn/T/tracker1300409870996753788.py", line 396, in get_host_ip
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:     s.connect(('10.255.255.255', 1))
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:   File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger:     return getattr(self._sock,name)(*args)
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger: socket.error: [Errno 51] Network is unreachable
16/11/11 19:16:00 INFO RabitTracker$TrackerProcessLogger: Tracker Process ends with exit code 1
16/11/11 19:16:00 INFO XGBoostSpark: repartitioning training set to 4 partitions