[API Proposal]: Soft References

Question

[API Proposal]: Soft References

qwertie opened this issue 3 years ago · comments

Background and motivation

MemoryCache is the "standard" way to cache things in .NET, but its behavior is unintuitive and it does not guarantee that it will evict cache entries quickly enough to prevent OutOfMemoryExceptions, among other issues such as badly bloated cache entries. Plus, it apparently relies on some kind of black magic (the documentation for which I have not been able to locate) to detect the size in bytes of objects in the cache, so that using it correctly is difficult: even if I am using it correctly, it's difficult to be confident of that! I would like to be able to put objects in a cache that contain two kinds of references: (1) references to "owned" subobjects that should be counted as part of the parent object, and (2) references to (large) shared objects that can never be evicted. I can't imagine how anything except the garbage collector would be able to detect that (1) is only reachable via the cache and so should be counted for "eviction" purposes, while (2) cannot be GC'd.

Finally, if the goal is to prevent memory exhaustion, MemoryCache is problematic because multiple cache instances can exist that do not coordinate with one another.

Weak references tend to be collected far too quickly to be used in caches. Soft references would solve this problem. Soft references are like weak references, but garbage-collected much less aggressively.

API Proposal

An obvious interface would be to replicate WeakReference<T>:

namespace System;

public sealed class SoftReference<T> : IWeakReference<T>
{
    public SoftReference(T target);
    ~SoftReference();

    public void SetTarget(T target);
    public bool TryGetTarget([MaybeNullWhen(false)][NotNullWhen(true)] out T target);
}

// An interface implemented by WeakReference<T> and SoftReference<T>
public interface IWeakReference<T>
{
    bool TryGetTarget([MaybeNullWhen(false)][NotNullWhen(true)] out T target);
    void SetTarget(T target);
}

API Usage

SoftReference<MyEntity>> _entity;

void Cache(MyEntity entity) => _entity = new SoftReference<MyEntity>(entity);

// Later on...
if (_entity != null && _entity.TryGetTarget(out MyEntity e))
  Console.WriteLine("We've still got it!");
else
  Console.WriteLine("Ain't got it!");

Alternative Designs

Another obvious design is to define SoftReference<T> as a derived class of WeakReference<T>. A third possibility is to add "softness" as a feature of the existing WeakReference class.

WeakReferences have a "track resurrection" feature. It's not clear to me that this would add value to a soft reference, but IMO this feature should be supported if it does not add significant complexity to the GC.

It should be kept in mind that after this feature is introduced, many applications could be dominated by soft references (i.e. at any given time, most objects are reachable only through soft references). Therefore, perhaps there should be a property of GC to control the preferred total memory usage of the process, which would affect the aggressiveness of soft-reference collection.

    // Sets a limit for memory usage that the GC should attempt to enforce 
    // by collecting more aggressively near the limit. This particularly affects 
    // the degree to which objects referenced via soft references are collected.
    public long SoftMemoryLimit { get; set; }

(I would very much like a hard memory limit too, but that's another story.)

Ideally, the GC would not collect all soft-referenced object when memory pressure is encountered, but instead prioritize which objects to collect first according to some kind of "priority". I believe the most commonly-desired way to prioritize would be by recency: to first get rid of objects that have not been used recently. To that end there could be a LastUsed property:

    // Controls garbage collection priority; unreachable objects with
    // lower values for LastUsed tend to be collected first.
    // - To artificially delay GC for a soft reference, increase it (eg add 24 hours)
    // - To artificially encourage GC for a soft reference, decrease it (eg subtract 
    //   24 hours; or use DateTime.MinValue to treat soft ref like WeakReference)
    // - Setter could convert all dates to UTC so that the GC can directly compare 
    //   LastUsed.Ticks of different soft references.
    public DateTime LastUsed { get; set; }

    // Variant of TryGetTarget that sets LastUsed = DateTime.UtcNow if target is alive
    public bool TryGetTargetAndSetLastUsed([MaybeNullWhen(false)][NotNullWhen(true)] out T target) {
        if (TryGetTarget(out target)) {
            LastUsed = DateTime.UtcNow;
            return true;
        }
        return false;
    }

Rather than adding bool updateLastUsed as a parameter on TryGetTarget, it could be a separate boolean property so that it is possible to configure IWeakReference.TryGetTarget() to update LastUsed.

It is possible that there are multiple soft references to the same object. It is tempting to put the last-used timestamp on the object itself so that there can only be a single timestamp:

// Any class could implement this interface in order to control GC priority
public interface IGCSoftReferencePriority
{
    DateTime LastUsed { get; set; }
}

However this approach would have major disadvantages:

Users may certainly wish to hold soft references to objects that don't implement IGCSoftReferencePriority
The very act of checking whether IGCSoftReferencePriority is implemented might be too expensive inside the GC
I expect that user-defined code cannot be called during GC's stop-the-world phase, and the property could do strange things like allocate memory, loop indefinitely, return a different value each time it is called, etc.

Risks

I have no idea how difficult it would be to implement this in the GC.

msftbot · Answer 1 · Fri Dec 24 2021 11:48:34 GMT+0800 (China Standard Time)

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

MemoryCache is the "standard" way to cache things in .NET, but its behavior is unintuitive and it does not guarantee that it will evict cache entries quickly enough to prevent OutOfMemoryExceptions, among other issues. Plus, it apparently relies on some kind of black magic (the documentation for which I have not been able to locate) to detect the size in bytes of objects in the cache, so that using it correctly is difficult: even if I am using it correctly, it's difficult to be confident of that! I would like to be able to put objects in a cache that contain two kinds of references: (1) references to "owned" subobjects that should be counted as part of the parent object, and (2) references to (large) shared objects that can never be evicted. I can't imagine how anything except the garbage collector would be able to detect that (1) is only reachable via the cache and so should be counted for "eviction" purposes, while (2) cannot be GC'd.

Finally, if the goal is to prevent memory exhaustion, MemoryCache is problematic because multiple cache instances can exist that do not coordinate with one another.

Weak references tend to be collected far too quickly to be used in caches. Soft references would solve this problem. Soft references are like weak references, but garbage-collected much less aggressively.

API Proposal

An obvious interface would be to replicate WeakReference<T>:

namespace System;

public sealed class SoftReference<T> : IWeakReference<T>
{
    public SoftReference(T target);
    ~SoftReference();

    public void SetTarget(T target);
    public bool TryGetTarget([MaybeNullWhen(false)][NotNullWhen(true)] out T target);
    // If updateLastUsed is true, LastUsed is set to DateTime.UtcNow
    public bool TryGetTarget([MaybeNullWhen(false)][NotNullWhen(true)] out T target, bool updateLastUsed);
}

// An interface implemented by WeakReference<T> and SoftReference<T>
public interface IWeakReference<T>
{
    bool TryGetTarget([MaybeNullWhen(false)][NotNullWhen(true)] out T target);
    void SetTarget(T target);
}

API Usage

SoftReference<MyEntity>> _entity;

void Cache(MyEntity entity) => _entity = new SoftReference<MyEntity>(entity);

// Later on...
if (_entity != null && _entity.TryGetTarget(out MyEntity e))
  Console.WriteLine("We've still got it!");
else
  Console.WriteLine("Ain't got it!");

Alternative Designs

Another obvious design is to define SoftReference<T> as a derived class of WeakReference<T>. A third possibility is to add "softness" as a feature of the existing WeakReference class.

WeakReferences have a "track resurrection" feature. It's not clear to me that this would add value to a soft reference, but IMO this feature should be supported if it does not add significant complexity to the GC.

It should be kept in mind that after this feature is introduced, many applications could be dominated by soft references (i.e. at any given time, most objects are reachable only through soft references). Therefore, perhaps there should be a property of GC to control the preferred total memory usage of the process, which would affect the aggressiveness of soft-reference collection.

    // Sets a limit for memory usage that the GC should attempt to enforce 
    // by collecting more aggressively near the limit. This particularly affects 
    // the degree to which objects referenced via soft references are collected.
    public long SoftMemoryLimit { get; set; }

(I would very much like a hard memory limit too, but that's another story.)

Ideally, the GC would not collect all soft-referenced object when memory pressure is encountered, but instead prioritize which objects to collect first according to some kind of "priority". I believe the most commonly-desired way to prioritize would be by recency: to first get rid of objects that have not been used recently. To that end there could be a LastUsed property:

    // Controls garbage collection priority; unreachable objects with
    // lower values for LastUsed tend to be collected first.
    // - To artificially delay GC for a soft reference, increase it (eg add 24 hours)
    // - To artificially encourage GC for a soft reference, decrease it (eg subtract 
    //   24 hours; or use DateTime.MinValue to treat soft ref like WeakReference)
    // - Setter could convert all dates to UTC so that the GC can directly compare 
    //   LastUsed.Ticks of different soft references.
    public DateTime LastUsed { get; set; }

    // Variant of TryGetTarget that sets LastUsed = DateTime.UtcNow if target is alive
    public bool TryGetTargetAndSetLastUsed([MaybeNullWhen(false)][NotNullWhen(true)] out T target) {
        if (TryGetTarget(out target)) {
            LastUsed = DateTime.UtcNow;
            return true;
        }
        return false;
    }

Rather than adding bool updateLastUsed as a parameter on TryGetTarget, it could be a separate boolean property so that it is possible to configure IWeakReference.TryGetTarget() to update LastUsed.

It is possible that there are multiple soft references to the same object. It is tempting to put the last-used timestamp on the object itself so that there can only be a single timestamp:

// Any class could implement this interface in order to control GC priority
public interface IGCSoftReferencePriority
{
    DateTime LastUsed { get; set; }
}

However this approach would have major disadvantages:

Users may certainly wish to hold soft references to objects that don't implement IGCSoftReferencePriority
The very act of checking whether IGCSoftReferencePriority is implemented might be too expensive inside the GC
I expect that user-defined code cannot be called during GC's stop-the-world phase, and the property could do strange things like allocate memory, loop indefinitely, return a different value each time it is called, etc.

Risks

I have no idea how difficult it would be to implement this in the GC.

Author:	qwertie
Assignees:	-
Labels:	`api-suggestion`, `area-GC-coreclr`, `untriaged`
Milestone:	-

Huo Yaoyuan · Answer 2 · Fri Dec 24 2021 14:47:43 GMT+0800 (China Standard Time)

Should it be a variant of WeakReference, with an option to control it?

Jan Kotas · Answer 3 · Fri Dec 24 2021 15:08:37 GMT+0800 (China Standard Time)

Soft references are like weak references, but garbage-collected much less aggressively.

How do you propose to implement "much less aggressively"?

Theodore Tsirpanis · Answer 4 · Sat Dec 25 2021 04:41:18 GMT+0800 (China Standard Time)

Soft references could be considered as strong references until memory pressure gets high where they would be converted to weak references. Under the hood they could (maybe I'm wrong I have no idea) be implemented as a new type of GCHandle and stored in a separate table that would be scanned by the GC's mark phase only when memory pressure is too high. The transition from a strong to a weak reference would be one-way to keep things simple.

There are two major questions. The first is what does "memory pressure gets high" mean, and it's kind of solved as evidenced in the shared ArrayPool (still would need tweaking though). The second question is how many and which soft references to release on high memory pressure. All? The earliest N(%) created? The top N(%) by object size?

In the meanwhile, using constructs like the Gen2GcCallback and checking memory pressure could help implement something similar to what I proposed in managed code, albeit with a bigger overhead.

Jan Kotas · Answer 5 · Sat Dec 25 2021 04:53:51 GMT+0800 (China Standard Time)

The first is what does "memory pressure gets high" mean, and it's kind of solved as evidenced in the shared ArrayPool

ArrayPool uses number of strategies. In addition to checking for memory pressure, it also uses timestamp to track when the pooled array was used last time and releases the pooled array once it was not used for a while.

checking memory pressure could help implement something similar to what I proposed in managed code, albeit with a bigger overhead.

It is not obvious to me that the soft references have lower overhead than checking memory pressure. For example, I expect that we would see regression in ArrayPool if it was switched to use the soft references proposed here.

#53895 has a lot of additional discussion of this problem space.

David Piepgrass · Answer 6 · Sat Dec 25 2021 05:28:27 GMT+0800 (China Standard Time)

@jkotas I'm no GC expert and don't even know how weak references work, so I'd leave it up to the GC team to choose an implementation. I like how @teo-tsirpanis frames it, as soft refs should act like strong refs until some memory pressure threshold is crossed (such as the SoftMemoryLimit I suggested, or OS-specific heuristics).

Stephen A. Imhoff · Answer 7 · Sat Dec 25 2021 13:52:13 GMT+0800 (China Standard Time)

... you're asking for some sort of SoftReference, but your actual problem seems to be caching things. If you just want to cache things, it's a good bet there are third-party libraries with slimmer (or tuned in other ways) cache implementations.

it does not guarantee that it will evict cache entries quickly enough to prevent OutOfMemoryExceptions

It doesn't guarantee that cache entries will ever be evicted (ie, they can all be held).

Finally, if the goal is to prevent memory exhaustion, MemoryCache is problematic because multiple cache instances can exist that do not coordinate with one another.

Why would they? Being independent is usually a benefit.

(1) references to "owned" subobjects that should be counted as part of the parent object, and (2) references to (large) shared objects that can never be evicted.

It sounds like you want two caches tuned differently, or a regular cache and a dictionary. If necessary, implement a wrapper that divides them appropriately for you.

David Piepgrass · Answer 8 · Mon Dec 27 2021 23:37:06 GMT+0800 (China Standard Time)

@Clockwork-Muse I want a cache that uses as much memory as possible with (for all practical purposes) no chance of OutOfMemoryException (and also avoiding using swap space, if available). If you can tell me how to do that without soft references, please do.

Stephen A. Imhoff · Answer 9 · Mon Dec 27 2021 23:58:01 GMT+0800 (China Standard Time)

What I was getting at was - are you planning on writing one, or would a different implementation of an in-memory cache be sufficient (provided by a third-party library, or due to changes to MemoryCache itself)?

and also avoiding using swap space, if available

Side note: this may be outside application notice/control, depending on OS/settings.

David Piepgrass · Answer 10 · Tue Dec 28 2021 22:04:54 GMT+0800 (China Standard Time)

I do not want to write another cache, and I do not know how a cache could (even in principle) have the characteristics that soft references would provide.

this may be outside application notice/control,

Yes, but to the extent it is possible, it would involve OS-specific metrics/mechanisms that maybe the CLR already uses to help choose the GC interval. I expect it is possible to ask the OS how much physical memory there is, and how much memory is being used by other apps and by the current app; this seems like all that is needed to choose a soft cap.

SRV · Answer 11 · Mon Jan 03 2022 18:04:46 GMT+0800 (China Standard Time)

Soft reference is actively used in JVM for caching but personally I'm not a fan of this mechanism because SoftMemoryLimit is very speculative. Probably, its better to use generations here and add a new configuration to WeakReference and GCHandle. For instance, we can allow to collect the object if it is not reachable through any strong reference and located in Gen2. This analysis can be done during full/background GC. I think it is more predictable behavior. Many users will not be able to calculate SoftMemoryLimit correctly. Normally, Gen2 should not grow constantly in normal applications so this policy has the same effect as the memory limit.

Stephen A. Imhoff · Answer 12 · Mon Jan 03 2022 23:54:17 GMT+0800 (China Standard Time)

Soft reference is actively used in JVM for caching

... When I did some cursory research for this issue earlier, I found documentation that for at least one JVM implementation, SoftReference is/was only an alias for WeakReference, but I can't find it now. So that might not work the way you expect.

SRV · Answer 13 · Tue Jan 04 2022 01:54:45 GMT+0800 (China Standard Time)

@Clockwork-Muse , Client JVM treats SoftReferences as WeakReferences. Server JVM does not.

SRV · Answer 14 · Tue Jan 04 2022 01:59:55 GMT+0800 (China Standard Time)

The Java HotSpot Server VM uses the maximum possible heap size (as set with the -Xmx option) to calculate free space remaining. The Java Hotspot Client VM uses the current heap size to calculate the free space.

That may be an outdated information because the HotSpot uses GraalVM under the hood. Anyway, JVM memory management strategy differs from .NET CLR which doesn't try to occupy maximum possible heap size. According to that, generation-based approach can be better and much more understandable.

David Piepgrass · Answer 15 · Wed Jan 05 2022 09:24:07 GMT+0800 (China Standard Time)

@sakno My suggestion is that SoftMemoryLimit is an upper bound, so it is safe to set it too high and the default value can be long.MaxValue.

Edit: "upper bound" is the wrong term, as it is possible for the amount of allocated memory to exceed it. What I meant was that the GC would choose its own target memory usage during GCs, and SoftMemoryLimit could decrease that target but not increase it. Also, SoftMemoryLimit could affect the timing of when to initiate GCs, but I imagine the GC treating it as a way of pretending that the machine has less physical memory than the OS reports, so again it would serve in an advisory role and it is not crucial that the user sizes it "correctly". Still, having said that, on second thought, I guess its type should be long? with default value null so that the GC would be allowed to behave differently when the programmer has selected a limit than when ze hasn't.

David Piepgrass · Answer 16 · Wed Jan 05 2022 09:52:03 GMT+0800 (China Standard Time)

Having said all that, if a soft reference is merely a weak reference that can only be collected in Gen2, that sounds like a feature that the team could do quickly and easily, and if that means the feature could be available in .NET 7 I would be very happy.

SRV · Answer 17 · Wed Jan 05 2022 21:17:39 GMT+0800 (China Standard Time)

@qwertie , a reference that can only be collected in Get2 can be implemented without runtime support using existing weak references. The main idea is to use finalizer as a callback from GC to track object generation:

public readonly struct SoftReference<T>
    where T : class
{
    private sealed class Tracker
    {
        internal readonly T Target;
        private readonly WeakReference parent;

        internal Tracker(T target, WeakReference parent)
        {
              Target = target;
              this.parent = parent;
        }

        ~Tracker()
        {
            if (GC.GetGeneration(Target) < GC.MaxGeneration)
                GC.ReRegisterForFinalize(this);
            else
                 parent.Target = Target; // downgrade from soft to weak reference
        }
    }

    private readonly WeakReference? reference;

    public SoftReference(T? target)
    {
        if (target is null)
        {
            reference = null;
        }
        else if (GC.GetGeneration(target) == GC.MaxGeneration)
        {
            reference = new(target, trackResurrection: false);
        }
        else
        {
            var tracker = new Tracker(target);
            reference = new(tracker, trackResurrection: true);
            GC.KeepAlive(tracker);
        }
    }

    public void Clear()
    {
        if (reference?.Target is Tracker tracker)
        {
            GC.SuppressFinalize(tracker);
            reference.Target = null;
        }
    }

    public bool IsAlive => reference?.IsAlive ?? false;

    public T? Target => reference?.Target switch
    {
        Tracker tracker => tracker.Target,
        T target => target,
        _ => null
    }
}

The code inside of finalizer can be used to analyze your memory limit using GC.GetGCMemoryInfo method.

Edit: Soft reference must keep reference to the object even if the object reaches Gen2 but remains alive due to presence of strong references. In this case we need to "downgrade" the reference from soft to weak (see else branch in finalizer).

David Piepgrass · Answer 18 · Wed Jan 12 2022 15:34:26 GMT+0800 (China Standard Time)

Thanks, I'll give that I try when I have time.

Tim Cassell · Answer 19 · Wed Jan 12 2022 17:00:24 GMT+0800 (China Standard Time)

@sakno I found an issue with your implementation that would require allocating a new Tracker object if you wanted to set a new Target value and it was already downgraded to a WeakReference. So I rewrote it with that in mind and adjusted the API to be more like WeakReference<T>. Do you see any issues with my implementation?

public class SoftReference<T> where T : class
{
    private sealed class Tracker
    {
        private T? target;
        private readonly WeakReference<T> reference;

        internal Tracker(T? target, bool trackResurrection)
        {
            this.target = target;
            reference = new(target, trackResurrection);
        }

        internal void SetTarget(T? target)
        {
            this.target = target;
            reference.SetTarget(target);
        }

        internal bool TryGetTarget(out T? target)
        {
            target = this.target;
            return target != null || reference.TryGetTarget(out target);
        }

        ~Tracker()
        {
            if (target != null && GC.GetGeneration(target) == GC.MaxGeneration)
            {
                target = null; // downgrade from soft to weak reference
            }
            GC.ReRegisterForFinalize(this);
        }
    }

    private readonly WeakReference<Tracker> reference; // WeakReference allows finalizer to run, but it always resurrects itself until this is finalized.

    public SoftReference(T? target, bool trackResurrection)
    {
        var tracker = new Tracker(target, trackResurrection);
        reference = new(tracker, trackResurrection: true);
        GC.KeepAlive(tracker);
    }

    ~SoftReference()
    {
        reference.TryGetTarget(out Tracker tracker);
        GC.SuppressFinalize(tracker);
    }

    internal void SetTarget(T? target)
    {
        reference.TryGetTarget(out Tracker tracker);
        tracker.SetTarget(target);
    }

    internal bool TryGetTarget(out T? target)
    {
        reference.TryGetTarget(out Tracker tracker);
        return tracker.TryGetTarget(out target);
    }
}

Also, is there any reason for the runtime to not have this type? Maybe it's not quite in the spirit of SoftReference which is expected to only collect in low memory events?

[Edit] I imagine this behavior of only collecting on a certain GC generation could easily be added to WeakReference(<T>) with much less overhead than this implementation (this has 3 internal object allocations!).

SRV · Answer 20 · Wed Jan 12 2022 17:40:38 GMT+0800 (China Standard Time)

@timcassell , you can reduce 1 internal allocation. Here is the code: https://github.com/dotnet/dotNext/blob/develop/src/DotNext/Runtime/SoftReference.cs. The code in your example doesn't handle some specific situations like setting Target of WeakReference after finalization (that may fail). I expect to include it to the next version of the library. Anyway, feel free to use or change it as you want.

P.S.: Provided implementation also includes an option to control memory pressure in Gen2.

Tim Cassell · Answer 21 · Wed Jan 12 2022 17:53:00 GMT+0800 (China Standard Time)

@sakno Your implementation does not include the ability to overwrite the Target, which WeakReference supports. Or trackResurrection.

Tim Cassell · Answer 22 · Wed Jan 12 2022 18:01:44 GMT+0800 (China Standard Time)

The code in your example doesn't handle some specific situations like setting Target of WeakReference after finalization (that may fail).

Why would the Target need to be set if the SoftReference itself is already finalized? All internals of the SoftReference would then be eligible for GC, including the WeakReferences.

SRV · Answer 23 · Wed Jan 12 2022 19:09:16 GMT+0800 (China Standard Time)

Assume that you have a target object with two references:

A strong reference located somewhere in the code
A soft reference

Soft reference downgrades to weak reference when it Tracker is being finalized. However, the target is still referenced somewhere else using a strong reference. In that case, the target should be accessible via soft reference as well. That's why soft reference keeps weak reference to the target after downgrading.

In my implementation, SoftReference itself has a finalizer that cleans the internal reference to the target. Without this, the tracker will keep the target alive even if the soft reference itself is dead already.

Tim Cassell · Answer 24 · Wed Jan 12 2022 19:28:11 GMT+0800 (China Standard Time)

@sakno Once the SoftReference is finalized, all of its internals are GC eligible, so it will not keep the target alive. [Edit] The SuppressFinalize(tracker) should stop it from keeping the target alive.

Also, it is true that a reference located elsewhere in the code will keep the target alive in the WeakReference. That is always accessible via the Tracker.reference in my implementation. In my implementation, the SoftReference.reference's Target is also always alive as long as the SoftReference has not been finalized, thanks to the trackResurrection and the fact that it always resurrects itself. Or am I misunderstanding how trackResurrection works?

SRV · Answer 25 · Wed Jan 12 2022 21:32:55 GMT+0800 (China Standard Time)

Oh, I got it. Your implementation should work fine as well. My implementation allows to reduce one internal allocation. Also, from my personal view, the tracker is needed to keep the strong reference as long as needed. When the reference is downgraded, no need to keep the reference to the tracker itself because it is no longer useful.

Tim Cassell · Answer 26 · Wed Jan 12 2022 21:34:04 GMT+0800 (China Standard Time)

When the reference is downgraded, no need to keep the reference to the tracker itself because it is no longer useful.

That's only true because you don't support overwriting the target. It is necessary to keep it alive to overwrite the target without extra allocations.

Zhenwei Wu · Answer 27 · Thu Jan 13 2022 11:46:10 GMT+0800 (China Standard Time)

I have used a similar system to track liveness, but I had an issue that the referenced object (and all objects referenced only by it) will be finalized and resurrected. This will be a problem if those objects do finalization work in finalizer, because C# doesn't have a mechanism for an object to detect resurrection. After resurrection, the state of the object will be invalid. From the disucssion and code in this thread, I am not clear whether this problem is addressed. Could anyone explain briefly if I missed something?

Tim Cassell · Answer 28 · Thu Jan 13 2022 15:01:13 GMT+0800 (China Standard Time)

@acaly Thanks for bringing that oddity to attention. I have adjusted my implementation to fix that issue.

public class SoftReference<T> where T : class
{
    private sealed class Tracker
    {
        private readonly SoftReference<T> parent;

        internal Tracker(SoftReference<T> parent)
        {
            this.parent = parent;
        }

        ~Tracker()
        {
            parent.OnGC();
            GC.ReRegisterForFinalize(this);
        }

        internal void StopTracking()
        {
            GC.SuppressFinalize(this);
        }
    }

    private T? target;
    private readonly WeakReference<T?> targetReference;
    private readonly WeakReference<Tracker> callbackReference; // WeakReference allows finalizer to run, but it always resurrects itself until this is finalized.

    public SoftReference(T? target, bool trackResurrection)
    {
        this.target = target;
        targetReference = new(target, trackResurrection);
        var tracker = new Tracker(this);
        callbackReference = new(tracker, trackResurrection: true);
        GC.KeepAlive(tracker);
    }

    ~SoftReference()
    {
        callbackReference.TryGetTarget(out Tracker tracker);
        tracker.StopTracking();
    }

    private void OnGC()
    {
        if (target != null && GC.GetGeneration(target) == GC.MaxGeneration)
        {
            target = null; // downgrade from soft to weak reference
        }
    }

    public void SetTarget(T? target)
    {
        this.target = target;
        targetReference.SetTarget(target);
    }

    public bool TryGetTarget(out T? target)
    {
        target = this.target;
        return target != null || targetReference.TryGetTarget(out target);
    }
}

SRV · Answer 29 · Thu Jan 13 2022 16:08:42 GMT+0800 (China Standard Time)

@timcassell , you need to suppress finalization of the target object to avoid the problem mentioned by @acaly . In OnGC method, you need to re-register finalizer for the target object.

Tim Cassell · Answer 30 · Thu Jan 13 2022 16:26:48 GMT+0800 (China Standard Time)

@timcassell , you need to suppress finalization of the target object to avoid the problem mentioned by @acaly . In OnGC method, you need to re-register finalizer for the target object.

No, that doesn't make sense at all. Moving the target and weak reference out of the tracker resolves the issue. Also, since my implementation supports tracking resurrection, we absolutely do not want to override what the user expects (and even if we don't support tracking resurrection, we still don't want to force a re-register finalization on an object we don't own).

[Edit] Also, suppressing finalization of the target doesn't also suppress finalization of objects that it references, and it will still have an invalid state when we resurrect it. That's why it must be moved out of the tracker to prevent finalization at all.

Tim Cassell · Answer 31 · Thu Jan 13 2022 16:30:28 GMT+0800 (China Standard Time)

Btw, this implementation does not guarantee the target will live until a gen 2 collection, it only guarantees it will live until it is promoted to gen 2. To guarantee life until a gen 2 collection will require internal APIs. I believe there is a Gen2Callback internally of some sort.

[Edit] Do resurrected objects get promoted to higher generations? If so, we could check the generation of the tracker object in its finalizer before calling parent.OnGC()to guarantee survival until gen 2 collection.

[Edit2] I just reread the GC documentation, and it seems I was incorrect here. Objects that are promoted to gen 2 will only be collected in a gen 2 collection, even if they are eligible for collection during a gen 0 or gen 1 collection.

SRV · Answer 32 · Thu Jan 13 2022 18:11:48 GMT+0800 (China Standard Time)

There is another problem - the implementation is not thread safe.

Tim Cassell · Answer 33 · Thu Jan 13 2022 18:15:43 GMT+0800 (China Standard Time)

There is another problem - the implementation is not thread safe.

Which part? SetTarget and TryGetTarget are as thread safe as WeakReference is. [Edit] Actually I take that back. TryGetTarget should cache the target in a local before returning instead of overwriting the out variable.

I thought about thread safety for the OnGC, but I wasn't sure if it really matters. Isn't the GC usually stop-the-world and single threaded? Is a concurrent GC really an issue to be concerned about?

Tim Cassell · Answer 34 · Thu Jan 13 2022 18:25:16 GMT+0800 (China Standard Time)

Ok, here's a thread-safer version. I think there's no need to try to synchronize SetTarget, because if the user is calling that on 2 separate threads, that's a race condition where you wouldn't be able to tell which one sticks anyway.

I also don't believe WeakReference is thread-safe anyway, so this may be an effort in futility (no point in making SoftReference thread safe if WeakReference isn't).

public class SoftReference<T> where T : class
{
    private sealed class Tracker
    {
        private readonly SoftReference<T> parent;

        internal Tracker(SoftReference<T> parent)
        {
            this.parent = parent;
        }

        ~Tracker()
        {
            parent.OnGC();
            GC.ReRegisterForFinalize(this);
        }

        internal void StopTracking()
        {
            GC.SuppressFinalize(this);
        }
    }

    volatile private T? target;
    private readonly WeakReference<T?> targetReference;
    private readonly WeakReference<Tracker> callbackReference; // WeakReference allows finalizer to run, but it always resurrects itself until this is finalized.

    public SoftReference(T? target, bool trackResurrection)
    {
        this.target = target;
        targetReference = new(target, trackResurrection);
        var tracker = new Tracker(this);
        callbackReference = new(tracker, trackResurrection: true);
        GC.KeepAlive(tracker);
    }

    ~SoftReference()
    {
        callbackReference.TryGetTarget(out Tracker tracker);
        tracker.StopTracking();
    }

    private void OnGC()
    {
        T? _target = target;
        if (_target != null && GC.GetGeneration(_target) == GC.MaxGeneration)
        {
            Interlocked.CompareExchange(ref target, null, _target); // downgrade from soft to weak reference
        }
    }

    internal void SetTarget(T? target)
    {
        this.target = target;
        targetReference.SetTarget(target);
    }

    internal bool TryGetTarget(out T? target)
    {
        return targetReference.TryGetTarget(out target);
    }
}

But if you really wanted, you could just lock (targetReference).

[Edit] I removed the strong reference read in TryGetTarget because I realized it's unnecessary. Always reading from the WeakReference is perfectly fine.

Michael Telford · Answer 35 · Fri Jun 16 2023 21:41:31 GMT+0800 (China Standard Time)

One bit of feedback on this and WeakReference from someone much like the proposer that uses WeakReference extensively for caching/GC control applications.
Please provide a common Interface type IReference that both WeakReference, SoftReference and a hypothetical but obvious implementation HardReference (it's an Object!). Currently I'm either forced to juggle different lists, coordinate different locks when promoting/demoting or (as I've done) wrap WeakReference in another class and accept the additional object cost. If WeakReference implemented any interface I could have avoid several annoying things.

Zhenwei Wu · Answer 36 · Fri Jun 16 2023 23:09:49 GMT+0800 (China Standard Time)

@timcassell One lesson I learned previously from playing with resurrection is to never use it.

One issue of your code is, when the SoftReference object is no longer needed, execution order of the finalizers of SoftReference and its tracker will be unknown, meaning that you may have StopTracking() called first, and ~Tracker() called after it, which resurrects everything again forever.

Also I don't want to assume that checking the referenced object's generation can reflect the overall memory pressure. As jkotas said, it's probably much easier to explicitly check memory usage. The worst thing you can do is to add a separate background thread and check periodically. Even though, it will still be better, because GC no longer needs to handle those weak references and resurrections repeatedly, especially when the number of tracked objects increases.

Zhenwei Wu · Answer 37 · Fri Jun 16 2023 23:12:21 GMT+0800 (China Standard Time)

Please provide a common Interface type IReference that both WeakReference, SoftReference and a hypothetical but obvious implementation HardReference

You can implement weak references and strong references yourself and add whatever interfaces to it. The standard WeakReference<T> internally uses GCHandle. I am not sure about soft reference though.

Tim Cassell · Answer 38 · Sat Jun 17 2023 02:27:36 GMT+0800 (China Standard Time)

@timcassell One lesson I learned previously from playing with resurrection is to never use it.

One issue of your code is, when the SoftReference object is no longer needed, execution order of the finalizers of SoftReference and its tracker will be unknown, meaning that you may have StopTracking() called first, and ~Tracker() called after it, which resurrects everything again forever.

Does GC.SuppressFinalize not remove it from the finalizer queue? That would be surprising behavior.

According to the documentation, it will not be called.

This method sets a bit in the object header of obj, which the runtime checks when calling finalizers.

Also I don't want to assume that checking the referenced object's generation can reflect the overall memory pressure.

I agree. I don't particularly like this approach, I was just piggy-backing off @sakno's idea.