MicrosoftResearch / Naiad

The Naiad system provides fast incremental and iterative computation for data-parallel workloads

Home Page:http://microsoftresearch.github.io/Naiad/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why does Naiad have its own Pair type?

hmansell opened this issue · comments

Just wondering why you have this rather than using Tuple?

Tuple< … > is a class, and each instance ends up heap allocated when you have an array of them. Naiad’s Pair<V1, V2> is a struct, which allows big arrays of them to be a single object on the heap (when V1 and V2 are value types too) and the GC is much happier about this.

Fair enough - so what about KeyValuePair, which is a struct. I guess C# developers are pretty used to dealing with them.

KeyValuePair would probably work fine. There are some cases where the pair has things that aren’t keys and values (e.g. treating edges in a graph as a pair of ints, or (@username, #hashtag) pairs for Twitter data). The real answer is that it is something of a historical relic, in that we wanted to be sure we understood its implementation clearly enough that it wouldn’t be the root of any performance issues but may not need to anymore. It would be great to identify some standard type in .NET that could replace it.

One of the existing issues with using built-in data structures like KeyValuePair and Tuple is that Naiad generates its own serialization and deserialization code. This is simple for value types (because they have a default constructor) with public non-readonly fields (because they can be assigned), but non-trivial for reference types where we cannot automatically determine how to construct a new instance of the object.

I just pushed an experimental serialization mode in release v0.2.3, which falls back to using .NET serialization for non-value-typed objects, enabling the use of KeyValuePair and Tuple in Naiad programs. This can be enabled using the "--inlineserializer" option on the command line.

KeyValuePair is a value type, so presumably could have been used before.

Thanks

Within a single process, you have always been free to use any .NET type in a Naiad channel: reference types also work, but as Frank noted they lead to GC issues and so are not recommended.

The "problem" with KeyValuePair is that our serialization code generator doesn't (didn't) know how to generate a deserializer for it, because its public interface exposes only read-only properties. Serialization is much more efficient if you use the format Naiad expects — value types with mutable public fields — and thus we extensively use our own Pair type.

Going forward, it's useful to get feedback about which data structures would be useful in messages, and we'll add specialized support for things like Tuple — and provide extensibility interfaces — as we learn more.

Call me a functional programming extremist, but I think message types should be required to be immutable. Just doesn't make sense to mutate them. C# has reasonably decent features for constructing immutable objects, so I don't think this requirement should be unreasonable.

We agree (and Naiad does treat its messages as immutable). The problem is that Naiad's automatic serialization is based on reflection, and it works by enumerating and assigning all of the fields in a type. If those fields are readonly (or not public) this fails. Since readonly fields only get set in the constructor we would need to assign them there, but you can't have constructors in interfaces, so we can't enforce that one exists of the type we need. Mainly, it seems hard to take a bunch of bytes and turn them in to something with readonly fields unless you special case it, which we can do (as required).