dmlc / rabit

Reliable Allreduce and Broadcast Interface for distributed machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does the source package has any dependencies? like python version etc.

EasonZhaoZ opened this issue · comments

The Python version I am using is:

Python 3.5.0a1 (default, Jul 15 2015, 17:58:06)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux

When I run "python ../tracker/rabit_demo.py -n 2 basic.rabit".
I get :

Traceback (most recent call last):
File "../tracker/rabit_demo.py", line 96, in
tracker.submit(args.nworker, [], fun_submit = mthread_submit, verbose = args.verbose)
File "/home/zhaoyin.zy/rabit/tracker/rabit_tracker.py", line 316, in submit
master.accept_slaves(nslave)
File "/home/zhaoyin.zy/rabit/tracker/rabit_tracker.py", line 259, in accept_slaves
s = SlaveEntry(fd, s_addr)
File "/home/zhaoyin.zy/rabit/tracker/rabit_tracker.py", line 53, in init
magic = slave.recvint()
File "/home/zhaoyin.zy/rabit/tracker/rabit_tracker.py", line 35, in recvint
return struct.unpack('@i', self.recvall(4))[0]
File "/home/zhaoyin.zy/rabit/tracker/rabit_tracker.py", line 33, in recvall
return ''.join(res)
TypeError: sequence item 0: expected str instance, bytes found
AssertError:ReConnectLink failure 2
Socket RecvAll Error:Connection reset by peer

it seems to be my problem, I have not fully tested on python three. seems the recv of python function received int instead of string. So res is a list of bytes array instead of string and join did not work.
If you are interesting can you send a PR to fix this problem?

Thanks!

sorry for the late reply.
PR? what's the meaning?

p.s. how to run the job to yarn platform? I can not find instructions. Great THANKS for your help.