PlutusCore.Default.Builtins takes ~8 seconds to reload on a powerful machine
L-as opened this issue · comments
Summary
Doing :r
in GHCi on this module takes around ~8 seconds on a powerful Hetzner Cloud machine.
I assume this is due to the heavy type class magic on makeBuiltinMeaning
.
Steps to reproduce the behavior
- Open file
- Save
- Reload
Actual Result
~8 seconds to reload.
Expected Result
~1 second at most.
Describe the approach you would take to fix this
Removing the magic from makeBuiltinMeaning
would likely fix the issue. A solution might be to make the arity explicit through e.g. NP
from sop-core
rather than inferring the arity, removing implicit conversions, etc.
Do note I do not fully understand the module, so I might be wrong.
System info
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD EPYC Processor
CPU family: 23
Model: 49
Thread(s) per core: 1
Core(s) per socket: 16
Socket(s): 1
Stepping: 0
BogoMIPS: 4890.80
(snippet from lscpu
)
Just checked, reloads in 4 seconds on my machine.
Removing the magic from
makeBuiltinMeaning
would likely fix the issue.
Yes. Specifically, it's probably elaboration stuff. Even more specifically, it's likely the logic of merging list of referenced variables using Merge
as type families are notoriously inefficient. Not sure what to do about it, maybe making the user specify type variables themselves would work, somehow...
Anyways, I don't think we should mark it as bug
, it's not really such a high priority issue.
I'll look into it some time, thank you for bringing it up.
The issue templates had "feature request" and "bug". This felt like a bug more than a feature request, albeit a minor one.
The issue templates had "feature request" and "bug".
@zliu41 would it make sense to replace "feature request" with "enhancement"?
Yes. Specifically, it's probably elaboration stuff. Even more specifically, it's likely the logic of merging list of referenced variables using Merge as type families are notoriously inefficient. Not sure what to do about it, maybe making the user specify type variables themselves would work, somehow...
I've investigated it a bit and it's definitely not Merge
.
Then I remembered that we generate a crapload of Core for this module and sure enough it's 30k lines of Core (with -dsuppress-all -dno-suppress-type-signatures
!), of which 8k belong to the Generic
instance of DefaultFun
and at the very least 16k belong to the ToBuiltinMeaning
instance. I don't know if we need the former to be so huge, but we certainly need the latter as it's crucially important for performance for all BuiltinRuntime
s to get inlined and optimized properly. Now granted a lot of that doesn't happen in GHCi, but it still has to do a lot of work.
I do regularly optimize the amount of Core generated for this module, because I stare at that Core in a downstream module a lot and I need to be able to read it, so my conclusion is that we can't do anything much here, apart from maybe investigating why the Generic
instance is so huge.
Overall, I'm not going to lose my sleep over a few modules taking single-digit seconds to reload/compile.
I've looked into it in more detail due to being curious. It's definitely the Elaboration
stuff: if I comment it out together with all the polymorphic builtins that rely on it, I get the module to type check in ~1 second instead of ~4 seconds on my machine. Which naturally prompts the following question: what if we only do elaboration for builtins that actually need and not for all the builtins? That got me to ~3 seconds. I.e. I won a second at the cost of complicating the API. Not worth it, I believe.
I find the elaboration machinery useful and don't want to make people do elaboration manually, particularly given that they'd have to learn the different between the Type
and the Rep
context and it's just horrible. I have no ideas on how to speed up, nor do I want to touch it as it's extremely tiresome to debug type-level code, plus the evaluation order is particularly crazy there and we rely on it to a certain extent.
My conclusion therefore is that there isn't much I can do here, so I'm going to simply close the issue as "won't do", given the relatively low impact it has on the development process. If any bright idea comes to my tired head, I'll try it out, and do feel free to reopen the issue if you believe it should stay open.
Oh yeah, and thanks a lot for reporting! It would be great to have it fixed, if there was a cheap way to do that.