Question: Software write watch and Background GC?

Question

Question: Software write watch and Background GC?

RalfKornmannEnvision opened this issue 4 years ago · comments

RalfKornmannEnvision commented 4 years ago

Maybe I am trying to open a box that I should better left closed. In this case just tell me that this is currently way beyond the scope.

I noticed that the runtime code for the ARM64 write barrier has changed since I originally ported it. Now it contains support for FEATURE_MANUALLY_MANAGED_CARD_BUNDLES and FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP. Therefore I was porting these changes over and tried to make use of them.

As there was some code missing in StompWriteBarrier I copied the parts over from the runtime, too.
This got me to the point were the GC is starting the background thread and is trying to do a background collection. Unfortunately it runs in an assert in a validate method.

The question is if it even make sense at this point to try getting it working or is there just too much stuff missing?

And in any case what parts of my changes do you want to have if any at all?

PS The FEATURE_MANUALLY_MANAGED_CARD_BUNDLES part alone seems to work. At least the code in the writer barrier is executed without crashing. But I am not sure id there is a real test case to ensure it works correctly.

Jan Kotas · Answer 1 · Tue Sep 01 2020 00:42:23 GMT+0800 (China Standard Time)

The only thing missing for background GC should be the changes in the write barrier helper.

You may also need to add this to InlinedBulkWriteBarrier in C++ runtime code. It should match InlinedSetCardsAfterBulkCopyHelper in CoreCLR.

The rest should just work. It should be pretty doable to turn this on.

RalfKornmannEnvision · Answer 2 · Tue Sep 01 2020 13:39:22 GMT+0800 (China Standard Time)

Will update the other places, too.

In the meantime I found the issue why the background GC gives me an assert. The GC expect that virtual commit behaves in a specific way. All committed memory should be cleared but only the first time a page is committed. When the memory for the background GC is committed it might commit the same page twice as this memory is part of a large block with multiple sections. The sections them self are not aligned to page boundaries. Unfortunately the virtual memory I have doesn't behave that well and I end up with either uninitialized memory or the page is cleared again the second time it is committed.

From a performances point of view the best solution would properly to add a new feature to the GC were is just clear the sections it needs to be zeroed after a commit. But as this would be only need in some special cases were CoreRT needs to run on a system were the virtual memory manager doesn't behave that well I am unsure if you would accept such a pull request. I expect to see this mostly in bare metal situations were the memory is not shared between different processes and therefore there is no risk of leaking information.

If not I need to add some code in the GC to OS interface that makes sure that the same memory is not cleared again when committed multiple times.

Jan Kotas · Answer 3 · Tue Sep 01 2020 14:04:55 GMT+0800 (China Standard Time)

The GC should have a good idea about the sections that are committed already. Can this be fixed by fixing the ranges that the GC is passing into the virtual alloc calls?

RalfKornmannEnvision · Answer 4 · Tue Sep 01 2020 14:39:02 GMT+0800 (China Standard Time)

Maybe. I haven't stepped through the GC code enough to be sure.

The overlapping commit is caused at gc_heap::commit_mark_array_by_range (line 25929)
gc.cpp

    size_t beg_word = mark_word_of (begin);
    size_t end_word = mark_word_of (align_on_mark_word (end));
    uint8_t* commit_start = align_lower_page ((uint8_t*)&mark_array_addr[beg_word]);
    uint8_t* commit_end = align_on_page ((uint8_t*)&mark_array_addr[end_word]);
    size_t size = (size_t)(commit_end - commit_start);

during gc_heap::commit_mark_array_bgc_init for multiple segments. (line 26002)

The code knows the exact range it needs to be committed but align it to the next page boundaries to make sure the region is completely included. It might be that the first page is always already committed but I don't know enough about the GC to be sure about this.

RalfKornmannEnvision · Answer 5 · Tue Sep 01 2020 17:00:15 GMT+0800 (China Standard Time)

Found an issue related to this that keep the Background GC from working, too: #8292

RalfKornmannEnvision · Answer 6 · Tue Sep 01 2020 22:13:23 GMT+0800 (China Standard Time)

Beside of this double commit issue the background GC is now working. I did some more debugging and it looks like that this memory block contains different management data for a segment. While the whole block is reserved in one go the actual commit is split in two parts. Directly after the reserve anything beside the mark array at the end of the block. The commit of the mark array is postponed until the background GC thread is started. Based on this it looks save for me to move the start of the commit for the mark array to the next page if it starts somewhere in the page that is already committed.

Jan Kotas · Answer 7 · Tue Sep 01 2020 22:15:30 GMT+0800 (China Standard Time)

It is best to submit GC fixes to dotnet/runtime repo, and get better testing and review from GC experts there.

RalfKornmannEnvision · Answer 8 · Tue Sep 01 2020 22:22:56 GMT+0800 (China Standard Time)

Well it's not really a bug as any of the currently officially supported OS have a virtual memory manager that behaves nicely. I will open a issue over there and ask what solution they prefer