natalie-lang / natalie

a work-in-progress Ruby compiler, written in Ruby and C++

Home Page:https://natalie-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bignums are SLOW in the self-hosted compiler

seven1m opened this issue · comments

time bin/natalie -e "p 2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
bin/natalie -e   0.65s user 0.16s system 99% cpu 0.809 total

time bin/nat -e "p 2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
bin/nat -e   26.19s user 4.23s system 99% cpu 30.426 total

I added a printf to the BigInt(const TM::String &, const base = 10) constructor, and it:

  • is called once with bin/natalie
  • is called 5,068,273 times with bin/nat 😱 (5,030,321 times after #2041)

I'll keep digging on this as I have time...

That constructor is called 37,952 times with an empty program, so I suppose our compiler has some bignums in it, which is surprising to me!

Turns out those 37k calls to the BigInt constructor were from the left shift operator I fixed in #2041. Down the rabbit hole I go...

OK, last update for today... I set a breakpoint and tracked down the first half-dozen calls to the BigInt constructor. This line in Prism keeps popping up:

https://github.com/ruby/prism/blob/695de7d3bf8cfa6c2dda90b401e474f3cadc73d2/templates/lib/prism/serialize.rb.erb#L211

      def load_integer
        negative = io.getbyte != 0
        length = load_varuint

        value = 0
        length.times { |index| value |= (load_varuint << (index * 32)) } # <-------- this line

        value = -value if negative
        value
      end

I was wondering why we were serializing the parse tree, it seemed like an easy win to just not do that, but it looks like that's the way how Prism gets the AST from C to Ruby:

Prism provides the ability to serialize the AST and its related metadata into a binary format. This format is designed to be portable to different languages and runtimes so that you only need to make one FFI call in order to parse Ruby code

In the case of the bignum usage, what is the value of length? I would guess the vast majority of numbers in the Ruby code would fit in 32 bits, so we would effectively be running uint32_t << 0. which should not start a BigInt. But maybe Prism uses a lot of big numbers internally.

Maybe as an experiment we could skip the loop in case of a number that fits in 32 bits:

value = 0
if length == 1
  value = load_varuint
else
  length.times { |index| value |= (load_varuint << (index * 32)) } # <-------- this line
end

#2049 helps a lot here! 🎉

→ time bin/natalie -e "p 2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
bin/natalie -e   0.63s user 0.20s system 99% cpu 0.826 total

→ time bin/nat -e "p 2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
bin/nat -e   0.96s user 0.12s system 99% cpu 1.080 total

I didn't dig much into how the Prism serialization code works... there might be some perf to squeeze out of that, but I'm happy with this result.

Oh, and with the new library, BigInt is only instantiated 1520 times with the above code, and 21 times with an empty program. Most of the 5 million calls to BigInt documented in this issue were from our bigint library itself. Whoops.