ReubenBond / DeepCopy

Simple & efficient library for deep copying .NET objects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A few enhancements

Tornhoof opened this issue · comments

Your article is nice and well written :)

As you asked for pull requests, I'll have one ready in a few minutes for more immutable support (e.g. keyvaluepair, url, version, tuples)

A few Comments and questions:

  • Your type keys

    var type = original.GetType();
    if (!type.IsValueType)
    {
    // Handle arrays specially.
    var originalArray = original as Array;
    if (originalArray != null) return (T)CopyArray(originalArray, context);
    }
    var typedCopier = CopierGenerator.GetOrCreateCopier<T>(type);
    return typedCopier(original, context);

    This is rather strange, you have a generic method call for T but still pass the type in. Actually you then use both types as key in
    var parameterType = typeof(T);
    var key = (type, parameterType);
    if (!this.copiers.TryGetValue(key, out var untypedCopier))

    I think one type is enough there, either you have the typed method or you pass the type as parameter.

  • You need to cache the immutable delegate for immutables too:

    public DeepCopyDelegate<T> GetOrCreateCopier<T>(Type type)
    {
    if (this.copyPolicy.IsImmutable(type))
    {
    T ImmutableCopier(T original, CopyContext context) => original;
    return ImmutableCopier;
    }

    You create a new delegate and check the rather cost expensive type check for immutability each and every call

  • You should use GetOrAdd instead of your own TryGetValue logic here:

    if (!this.copiers.TryGetValue(key, out var untypedCopier))
    {
    untypedCopier = this.CreateCopier<T>(type);
    this.copiers.TryAdd(key, untypedCopier);
    }

I'm not certain that your CachedReadConcurrentDictionary is actually threadsafe regarding the optimizations there, I know it's used in Orleans too, you even added a few MemoryBarriers, which are not in the original source, but they should not affect the items below.
Anyway my point:

I personally recommend to remove that optimization from this project and you really should go with a very very fine comb through the original code in Orleans, this might produce subtle bugs (e.g. value not in dictionary, even though it was added or vice versa) depending on your usage pattern. Relying on an implementation detail, being that Dictionary<,> is actually thread-safe for reads is a bad idea anyway.

Good point on the dictionary, there are ways to fix it (eg, fallback) but it really isn't needed.

Regarding the other points, I'll address them when I'm at my PC - there are some subtleties which might not be immediately clear.

This is rather strange, you have a generic method call for T but still pass the type in. Actually you then use both types as key in

Imagine that the end user has their own method like so:

object MyCopy(object input) => DeepCopy.Copy(input);

The generic type parameter inferred for DeepCopy.Copy will be object. This is fine for the purposes of the method signature, but the runtime type of input will almost never be object - the static type and runtime type will vary. At the same time, we want to avoid boxing when the user calls DeepCopy.Copy(myStruct). That's why we don't just use object as the type under the hood. You'll notice that the code in Orleans does use object. It's not a severe performance penalty, just something I was having fun with for this project. typeof(T). IsAssignableFrom(type) will always be true, but type.IsAssignableFrom(typeof(T)) will often be false. Is it more clear now why the type is specified in two forms? One specifies the delegate type and the other specified the runtime type.

You create a new delegate and check the rather cost expensive type check for immutability each and every call

Ah, you're right. I broke this while refactoring. I'll submit a fix.

You should use GetOrAdd instead of your own TryGetValue logic here:

The reason for not using GetOrAdd is that it would require allocating a delegate which cannot be effectively cached since we expect T to vary. It's ok for calls to TryAdd to return false and for some work to be occasionally wasted here.

object MyCopy(object input) => DeepCopy.Copy(input)

Ah, yes this is a common case, but still would mean only one type (the one from input.GetType()). You should be able to special case that with a non-typed method (public static object Copy(object original)), then object won't use the typed one anymore.
Your argument regarding IsAssignableFrom is obviously valid, but do you actually have a use case (except for object) where this is necesssary, I can't think of any.
Because if you don't you can get rid of the dictionary alltogether for the common typed case.

The reason for not using GetOrAdd is that it would require allocating a delegate which cannot be effectively cached since we expect T to vary

Ah yes, you're right, your CreateCopier method is generic. You can still get around that, by making CreateCopier untyped and static and passing your valuetuple into it, this obviously needs a bit of refactorization, as you need to make the fields in CopierGenerator static too. But then there should be no hidden captures or delegate allocations, e.g.

public DeepCopyDelegate<T> GetOrCreateCopier<T>(Type type)
{
    // [...]
    return (DeepCopyDelegate<T>) this.copiers.GetOrAdd(key, k => CreateCopierUntyped(k));
}
private static Delegate CreateCopierUntyped((Type type, Type parameterType) key)
{
      // create delegate here
}

The same thing can happen for any non-sealed class: T might be BaseType and type might be SubType. It's a common case in my experience.

Because if you don't you can get rid of the dictionary alltogether for the common typed case.

I don't understand - how can the dictionary be avoided altogether?

You can still get around that, by making CreateCopier untyped and static and passing your valuetuple into it

Good point! On the other hand, GetOrAdd takes a lock and in this case we can be optimistic instead, so I'm not fussed either way.

The same thing can happen for any non-sealed class: T might be BaseType and type might be SubType.

Hmm, ok yeah, I admit I haven't seen that for my clone util in my codebase, so I'm not sure if you can get around that in most other cases with something like Copy<TIn, TOut> and appropriate constraints.

I don't understand - how can the dictionary be avoided altogether?

I mean the static dictionary type "trick" if you have the real type in the generic typ arg.
e.g.

internal static class Copier<T>
{
    public static DeepCopyDelegate<T> Delegate { get; } = InitDelegate();

    private static DeepCopyDelegate<T> InitDelegate()
    {
        return (original, context) => original; // create the real delegate here
    }
}

and then use
var typedCopier = Copier<T>.Delegate;
instead of
var typedCopier = CopierGenerator.GetOrCreateCopier<T>(type);

In any case, the changes are already deep into micro-optimizations and probably need some micro-benchmarks to verify. I personally guess that using the static dictionary trick and a fallback concurrent dictionary for object etc. (As previously said, not certain about <TIn,TOut>) should result in a performance increase for the common cases

I see what you mean now - good idea.

I partially implemented this and some other optimizations. Thanks for your input

Closing this, as everything from the original post was discussed in detail.