MicrosoftResearch / Naiad

The Naiad system provides fast incremental and iterative computation for data-parallel workloads

Home Page:http://microsoftresearch.github.io/Naiad/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deadlock when running on Mono with >= 3 local processes

mrry opened this issue · comments

When running on Mono with 3 or more processes, a Naiad computation will deadlock in the controller initialization phase.

To reproduce:

$ ./Examples.exe connectedcomponents 1000000 2000000 -t 1 -n 3 --local -p 0 &
$ ./Examples.exe connectedcomponents 1000000 2000000 -t 1 -n 3 --local -p 1 &
$ ./Examples.exe connectedcomponents 1000000 2000000 -t 1 -n 3 --local -p 2 &

The same program works for -n 2 -p {0, 1}. Thanks to the folks at ETH Zurich for reporting this issue.

Further instrumentation of the socket connection phase reveals that, while all Connect() calls return, the BeginAccept()/EndAccept() in NaiadServer only completes once (for the first connection). This points to inconsistent behavior in the Accept loop of the NaiadServer.

Switching to a synchronous Accept in a separate server thread seems to fix this without causing a regression on .NET/Windows.