PlutusCore.Default.Builtins takes ~8 seconds to reload on a powerful machine

Question

PlutusCore.Default.Builtins takes ~8 seconds to reload on a powerful machine

L-as opened this issue a year ago · comments

Summary

Doing :r in GHCi on this module takes around ~8 seconds on a powerful Hetzner Cloud machine.
I assume this is due to the heavy type class magic on makeBuiltinMeaning.

Steps to reproduce the behavior

Open file
Save
Reload

Actual Result

~8 seconds to reload.

Expected Result

~1 second at most.

Describe the approach you would take to fix this

Removing the magic from makeBuiltinMeaning would likely fix the issue. A solution might be to make the arity explicit through e.g. NP from sop-core rather than inferring the arity, removing implicit conversions, etc.
Do note I do not fully understand the module, so I might be wrong.

System info

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC Processor
    CPU family:          23
    Model:               49
    Thread(s) per core:  1
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            4890.80

(snippet from lscpu)

effectfully · Answer 1 · Fri Mar 10 2023 21:58:32 GMT+0800 (China Standard Time)

Just checked, reloads in 4 seconds on my machine.

Removing the magic from makeBuiltinMeaning would likely fix the issue.

Yes. Specifically, it's probably elaboration stuff. Even more specifically, it's likely the logic of merging list of referenced variables using Merge as type families are notoriously inefficient. Not sure what to do about it, maybe making the user specify type variables themselves would work, somehow...

Anyways, I don't think we should mark it as bug, it's not really such a high priority issue.

I'll look into it some time, thank you for bringing it up.

Las Safin · Answer 2 · Sat Mar 11 2023 07:19:22 GMT+0800 (China Standard Time)

The issue templates had "feature request" and "bug". This felt like a bug more than a feature request, albeit a minor one.

effectfully · Answer 3 · Sun Mar 12 2023 02:56:10 GMT+0800 (China Standard Time)

The issue templates had "feature request" and "bug".

@zliu41 would it make sense to replace "feature request" with "enhancement"?

Yes. Specifically, it's probably elaboration stuff. Even more specifically, it's likely the logic of merging list of referenced variables using Merge as type families are notoriously inefficient. Not sure what to do about it, maybe making the user specify type variables themselves would work, somehow...

I've investigated it a bit and it's definitely not Merge.

Then I remembered that we generate a crapload of Core for this module and sure enough it's 30k lines of Core (with -dsuppress-all -dno-suppress-type-signatures!), of which 8k belong to the Generic instance of DefaultFun and at the very least 16k belong to the ToBuiltinMeaning instance. I don't know if we need the former to be so huge, but we certainly need the latter as it's crucially important for performance for all BuiltinRuntimes to get inlined and optimized properly. Now granted a lot of that doesn't happen in GHCi, but it still has to do a lot of work.

I do regularly optimize the amount of Core generated for this module, because I stare at that Core in a downstream module a lot and I need to be able to read it, so my conclusion is that we can't do anything much here, apart from maybe investigating why the Generic instance is so huge.

Overall, I'm not going to lose my sleep over a few modules taking single-digit seconds to reload/compile.

effectfully · Answer 4 · Sun Mar 12 2023 12:37:49 GMT+0800 (China Standard Time)

I've looked into it in more detail due to being curious. It's definitely the Elaboration stuff: if I comment it out together with all the polymorphic builtins that rely on it, I get the module to type check in ~1 second instead of ~4 seconds on my machine. Which naturally prompts the following question: what if we only do elaboration for builtins that actually need and not for all the builtins? That got me to ~3 seconds. I.e. I won a second at the cost of complicating the API. Not worth it, I believe.

I find the elaboration machinery useful and don't want to make people do elaboration manually, particularly given that they'd have to learn the different between the Type and the Rep context and it's just horrible. I have no ideas on how to speed up, nor do I want to touch it as it's extremely tiresome to debug type-level code, plus the evaluation order is particularly crazy there and we rely on it to a certain extent.

My conclusion therefore is that there isn't much I can do here, so I'm going to simply close the issue as "won't do", given the relatively low impact it has on the development process. If any bright idea comes to my tired head, I'll try it out, and do feel free to reopen the issue if you believe it should stay open.

effectfully · Answer 5 · Sun Mar 12 2023 12:40:39 GMT+0800 (China Standard Time)

Oh yeah, and thanks a lot for reporting! It would be great to have it fixed, if there was a cheap way to do that.