Why does Naiad have its own Pair type?

Question

Why does Naiad have its own Pair type?

hmansell opened this issue 11 years ago · comments

Just wondering why you have this rather than using Tuple?

Frank McSherry · Answer 1 · Tue Nov 12 2013 07:36:13 GMT+0800 (China Standard Time)

Tuple< … > is a class, and each instance ends up heap allocated when you have an array of them. Naiad’s Pair<V1, V2> is a struct, which allows big arrays of them to be a single object on the heap (when V1 and V2 are value types too) and the GC is much happier about this.

Howard Mansell · Answer 2 · Tue Nov 12 2013 07:40:29 GMT+0800 (China Standard Time)

Fair enough - so what about KeyValuePair, which is a struct. I guess C# developers are pretty used to dealing with them.

Frank McSherry · Answer 3 · Tue Nov 12 2013 07:47:23 GMT+0800 (China Standard Time)

KeyValuePair would probably work fine. There are some cases where the pair has things that aren’t keys and values (e.g. treating edges in a graph as a pair of ints, or (@username, #hashtag) pairs for Twitter data). The real answer is that it is something of a historical relic, in that we wanted to be sure we understood its implementation clearly enough that it wouldn’t be the root of any performance issues but may not need to anymore. It would be great to identify some standard type in .NET that could replace it.

Derek Murray · Answer 4 · Wed Nov 20 2013 07:58:55 GMT+0800 (China Standard Time)

One of the existing issues with using built-in data structures like KeyValuePair and Tuple is that Naiad generates its own serialization and deserialization code. This is simple for value types (because they have a default constructor) with public non-readonly fields (because they can be assigned), but non-trivial for reference types where we cannot automatically determine how to construct a new instance of the object.

I just pushed an experimental serialization mode in release v0.2.3, which falls back to using .NET serialization for non-value-typed objects, enabling the use of KeyValuePair and Tuple in Naiad programs. This can be enabled using the "--inlineserializer" option on the command line.

Howard Mansell · Answer 5 · Thu Nov 21 2013 21:34:14 GMT+0800 (China Standard Time)

KeyValuePair is a value type, so presumably could have been used before.

Thanks

Derek Murray · Answer 6 · Thu Nov 21 2013 21:52:47 GMT+0800 (China Standard Time)

Within a single process, you have always been free to use any .NET type in a Naiad channel: reference types also work, but as Frank noted they lead to GC issues and so are not recommended.

The "problem" with KeyValuePair is that our serialization code generator doesn't (didn't) know how to generate a deserializer for it, because its public interface exposes only read-only properties. Serialization is much more efficient if you use the format Naiad expects — value types with mutable public fields — and thus we extensively use our own Pair type.

Going forward, it's useful to get feedback about which data structures would be useful in messages, and we'll add specialized support for things like Tuple — and provide extensibility interfaces — as we learn more.

Howard Mansell · Answer 7 · Thu Nov 21 2013 22:04:54 GMT+0800 (China Standard Time)

Call me a functional programming extremist, but I think message types should be required to be immutable. Just doesn't make sense to mutate them. C# has reasonably decent features for constructing immutable objects, so I don't think this requirement should be unreasonable.

Frank McSherry · Answer 8 · Thu Nov 21 2013 22:57:26 GMT+0800 (China Standard Time)

We agree (and Naiad does treat its messages as immutable). The problem is that Naiad's automatic serialization is based on reflection, and it works by enumerating and assigning all of the fields in a type. If those fields are readonly (or not public) this fails. Since readonly fields only get set in the constructor we would need to assign them there, but you can't have constructors in interfaces, so we can't enforce that one exists of the type we need. Mainly, it seems hard to take a bunch of bytes and turn them in to something with readonly fields unless you special case it, which we can do (as required).