CoreRT slower then regular .NET
kant2002 opened this issue · comments
I thinking about checking how CoreRT works for the Wavelets and decide to use https://github.com/codeprof/TurboWavelets.Net as starting point.
I migrate project to new SDK format and add Benchmarks.Net using samples provided.
To my disappointment regular .NET seems to be faster then CoreRT.
// * Summary *
BenchmarkDotNet=v0.12.1.1420-nightly, OS=Windows 10.0.18363.1082 (1909/November2019Update/19H2)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.100-rc.1.20452.10
[Host] : .NET 5.0.0 (5.0.20.45114), X64 RyuJIT
.NET 5.0 : .NET 5.0.0 (5.0.20.45114), X64 RyuJIT
CoreRt 5.0 : .NET 5.0.29330.02 @BuiltBy: dlab14-DDVSOWINAGE075 @Branch: master @Commit: 145402e00724acbc9e7636739140fb84f7d64845, X64 AOT
| Method | Job | Runtime | Mean | Error | StdDev | Ratio | RatioSD |
|---------------------- |----------- |----------- |----------:|---------:|---------:|------:|--------:|
| Waveletimageupscaling | .NET 5.0 | .NET 5.0 | 155.72 ms | 3.267 ms | 9.426 ms | 1.00 | 0.00 |
| Waveletimageupscaling | CoreRt 5.0 | CoreRt 5.0 | 167.68 ms | 3.303 ms | 9.478 ms | 1.08 | 0.09 |
| | | | | | | | |
| AdaptiveDeadzone | .NET 5.0 | .NET 5.0 | 30.40 ms | 0.588 ms | 0.764 ms | 1.00 | 0.00 |
| AdaptiveDeadzone | CoreRt 5.0 | CoreRt 5.0 | 33.79 ms | 0.683 ms | 1.763 ms | 1.14 | 0.08 |
So I have generic questions.
- Does this results expected with CPU-bound workloads.
- What can I do to look more closely on this particular case.
For compute heavy workloads that don't use things like HW intrinsics, I would expect both to be pretty much on par, since codegen is the same.
I would run both under PerfView and check:
- GC Stats - does the GC do more work in one of them?
- Look at CPU samples - are the same methods hot? Is there something that stands out? If so, I would check disassembly on both and compare if we got worse codegen somewhere.
It is not unusual that performance of CPU-bound microbenchmarks is sensitive to memory alignment, code alignment or other factors that results into trends like this: dotnet/runtime#39031 (comment) . This can be one of these bi-modal cases and you may be just hitting the lucky/unlucky spots on the spectrum.
Another potential source of the difference is that RyuJIT in dotnet/corert is several months old at this point. It is possible that the RyuJIT shipping in .NET 5 has bug fixes that make a difference for this micro-benchmark. This will get fixed once we migrate the project to dotnet/runtimelab and pick up up-to-date RyuJIT.
What can I do to look more closely on this particular case.
Michal's advice in #8354 (comment) is spot on.
@jkotas Thanks for explanation about potential root causes. I thought that this maybe related to fact that this is micro-benchmark, but do not though that this maybe due to changes in the runtime.
@MichalStrehovsky I would try to look. Since my priority was to have interesting use-case for CoreRT would be better then regular .NET I have to scratch my head a bit to find it.
@RUSshy you can see my benchmarks here https://github.com/kant2002/TurboWavelets.Net/tree/kant/benchmarks this is pretty trivial microbenchmarks, This is not actual project where maybe I will have some gains.