MicrosoftResearch / Naiad

The Naiad system provides fast incremental and iterative computation for data-parallel workloads

Home Page:http://microsoftresearch.github.io/Naiad/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Serialization issues

utaal opened this issue · comments

Hello Naiad developers,

We're using Naiad for a university project and we don't seem to be able to figure out how to pass non-primitive types between vertices. As advised in the readme we're using structs but it seems like messages are sent and never received.

We're still working with the release_0.4 branch since release_0.5 does not work with mono.

Here's the code to reproduce the issue https://gist.github.com/utaal/c10e450e1227f7ef5462 .
Please drop it in /Examples/Naiad, add examples.Add ("serialization", new Serialization.Serialization ()); to Program::Main. TestVertex2 is not receiving messages.

Is there any other known issue with the serialization framework? Are we using it incorrectly?

Thank you.

Hi Andrea,

We'll grab 4.0 and try it out later this afternoon. To be honest, we're (Derek and I) a bit surprised that your gist compiles. We don't think there is a single-element constructor for Message, so we'll need to look a bit closer to understand what code path is actually running.

In the meantime, the recommended way we use VertexOutputBuffer (the this.Output variable) quite often is as shows up in libraries like Lindi, DifferentialDataflow, etc:

var output = this.Output.GetBufferForTime(message.time);
for (int i = 0; i < message.length; i++)
  output.Send(new SerializationTest (message.payload [i], message.payload [i] + 1));

But, we'll look in to what is going on and try and get back asap. Thanks!

Hi Andrea,

I pulled down 0.4, and couldn't get your example to compile (due to the Message constructor). I tweaked it slightly, and have a version at https://gist.github.com/frankmcsherry/a9f109cd5897344a9e17. There are three important things I did:

  1. I changed the sending to be as suggested above. I'm not sure what you were using (again, it didn't compile for me, so maybe a local edit?), but this is meant to work.
  2. I rigged it so that processes other than zero would advance their inputs (using OnNext with no arguments). This might work fine even without, but there are some race conditions if different processes use different numbers of epochs (because of some update coalescing, it becomes difficult for other processes to tell the difference between some final k epochs where nothing happens and k empty epochs; probably not an issue in this case).
  3. I added a computation.Sync(i); in to the main loop. Without this, the main thread forces 10M epochs of data in to Naiad without waiting for it to do any work, and this does a pretty decent job on the scheduler. Instead, allowing Naiad to finish out an epoch before submitting more data ensures that the internal scheduler state doesn't grow linearly (each scheduling decision is essentially linear in the available work, making it quadratic in your loop parameter). You can use computation.Sync(i-k); to allow k overlapped epochs, if that is appealing.

This seems to work for me on Mono using Naiad 0.4. Let us know if this sorts out the problem you were seeing, and perhaps if the Message constructor you were using was something we can help with.

Frank

Hi Frank,

Thank you so much for having a look at this.
You are correct - the gist only compiles because a constructor was added by a teammate to Message; in fact, the constructor was wrongly setting length to 0, thus effectively dropping the messages.
We apologize for this - I just reverted our local fork of Naiad to the verbatim release_0.4 to make sure we haven't introduced other bugs by intervening on the library.

We already knew about (2) - I just forgot to do that in the gist - but (3) was a great insight.

We seem to be having trouble with the scheduler when the number of threads per process (-t) is greater than 1 - however I still need to better isolate the issue. I'll get back to you once I have it figured out.

Are there any other gotchas (apart from the need to use structs) in the serialization system we need to be aware of? I'm asking because the group working on this in the past had encountered issues - that were then fixed later but we're not sure to what extent.

Naiad is turning out to be a really interesting way to model the computation for the problem we're working on now.
Thank you for building this.
Andrea