kkokosa / UpsilonGC

Custom Garbage Collectors for .NET Core

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Understanding the GCDesc structure

kevingosse opened this issue · comments

The code to retrieve the references from the GCDesc has been rewritten in C# in ClrMD. It's slightly easier to understand than the C++ version: https://github.com/microsoft/clrmd/blob/master/src/Microsoft.Diagnostics.Runtime/src/Common/GCDesc.cs

From what I understand, in the GCDesc, for a given object, there are multiple series of references. It seems that each series is a group of contiguous references in the object. For instance, take an object that looks like:

[StructLayout(LayoutKind.Sequential)]
public class SomeObject
{
	[FieldOffset(0)]
	public object ref1;
	[FieldOffset(8)]
	public object ref2;
	[FieldOffset(24)]
	public object ref3;
}

(for all this message, I'm assuming 64 bits. Adjust everything accordingly if running in 32 bits).
In the layout of the objects, the references ref1 and ref2 are contiguous, then you have some unused space, then ref3. In the GCDesc you will have two series, one for the first two references and one for the last. When you think about it, series work a bit like a free-list.

How to read those series? The number of series is stored in the last long (once again I'm assuming 64 bits) of the GCDesc. All series are two IntPtr wide, and packed at the end of the GCDesc. In short, the GCDesc looks like:

[Some stuff][Series][Series][Series][Number of series]

So all those series are located between [End of GCDesc - sizeof(long) - numSeries * sizeof(long) * 2] and [End of GCDesc - sizeof(long)]

For each of those series, the first long is the size of the series. The second long is the offset of the first of the references in the object (so in the SomeObject example, the size of the first series would be 2 and the offset would be 0, the size of the second series would be 1 and the offset would be 24).

From there, you have everything you need to read the references. Assuming addr is the address of the object you want to browse, you enumerate the series from the corresponding GCDesc, and for each series you read [size of the series] addresses starting from addr + [offset of the series].

Note that there are two cases, depending on whether "number of series" is a positive number or negative. I've only dug into the positive case, but the negative case look very similar.