Idea: Optimization of "struct" and "fields" objects

Question

Idea: Optimization of "struct" and "fields" objects

romix opened this issue 11 years ago · comments

This is just an idea.

"struct" and "fields" allow for defining types that have a fixed structure. May be this information can be used for more efficient access to these objects both in terms of memory usage and performance?

The following optimizations could be potentially possible:

More efficient layout of objects in memory. In theory, fields could be mapped to equivalent of a C-struct with corresponding fields.

Or at least to an Lua-table which has only array elements. Each field name gets mapped to an array-index. obj.fieldX becomes obj[idX]. This could be more memory efficient than hash-table based storage of those fields.

(1) is also somewhat similar to Python's named tuples: https://docs.python.org/2/library/collections.html#collections.namedtuple
They may use less memory and be accessed more efficiently.
May be LuaJIT/compiler may make use of the fact that the structure of objects does not change during run-time? This information could be used e.g. for copy propagation and some other optimizations... I'm not sure though that LuaJIT is able on its own to understand that structure of objects(i.e. tables) is fixed (this needs to be tested yet). Eventually, it may require some help, e.g. some hints ( a-la PyPy JIT hints).

Note: Overall, it would be interesting to understand if a compiler or runtime of Shine may somehow provide hints to LuaJIT to enable better optimizations. I understand that currently LuaJIT does not provide support for this. But if it would, what could be done? This kind of input could interesting for Mile Pall to convince him that supporting some sort of hints could be very valuable.

Since Lua is very dynamic, I can imagine that Shine compiler could potentially only statically make use of the fixed layout only in the scope where a variable was declared. But when it gets passed to a different function (or another scope, in general), then this statically known information is most likely lost (unless some sort of very clever escape analysis is used). I don't know if LuaJIT can do any better using traces and propagating this information along traces.

As I said, these are just some ideas. I'm not sure how feasible they are and if something like this is possible at all.

What do you think?

Richard Hundt · Answer 1 · Mon Apr 07 2014 16:18:03 GMT+0800 (China Standard Time)

On 4/6/14 11:19 PM, romix wrote:

This is just an idea.

"struct" and "fields" allow for defining types that have a fixed
structure. May be this information can be used for more efficient access
to these objects both in terms of memory usage and performance?

The following optimizations could be potentially possible:

More efficient layout of objects in memory. In theory, fields could
be mapped to equivalent of a C-struct with corresponding fields.

Well, if the values are not cdata types then that wouldn't work.

Or at least to an Lua-table which has only array elements. Each field
name gets mapped to an array-index. obj.fieldX becomes obj[idX]. This
could be more memory efficient than hash-table based storage of those
fields.

I've experimented with this kind of thing. So you do save on space.
However it also means that these structs respond to both obj[1] as
well as obj.fieldX, and saying the latter is an indirection: you need
a reverse mapping from fieldX to the slot index. So you can't really
get rid of the hash lookup for fieldX - it either references the
member value directly, or it references the slot which references the value.

What's more, there's also no guarantee that the members are going to be
actually stored in the "array part" of the table. That depends on the
order in which the fields are initialized. If fieldX was slot 1 and
fieldY slot 2, then setting fieldY first will make the underlying
table use the hash part anyway, so you don't gain anything.

The thing is that Lua's tables are fast because the don't re-hash the
string every time. They use pointer hashing - which is essentially
integer hashing. Strings are interned (well except for short strings,
but that's an implementation detail).

I've had cases where using LuaJIT's tables are faster than C structs. It
does some badass optimizations.

(1) is also somewhat similar to Python's named tuples:
https://docs.python.org/2/library/collections.html#collections.namedtuple
They may use less memory and be accessed more efficiently.

May be LuaJIT/compiler may make use of the fact that the structure of
objects does not change during run-time? This information could be used
e.g. for copy propagation and some other optimizations... I'm not sure
though that LuaJIT is able on its own to understand that structure of
objects(i.e. tables) is fixed (this needs to be tested yet). Eventually,
it may require some help, e.g. some hints ( a-la PyPy JIT hints).

LuaJIT has a TDUP instruction. What the compiler does is store the
constant parts tables in the bytecode, and then duplicate these instead
of reconstructing them. So if you had

t = { answer = 42, greet = function() print("Hi") end }

Then the {answer = 42} part is constant, whereas the function is a GC
object so needs to be attached after the TDUP. This definitely speeds
things up, but has limited use for our case, because struct definitions
have other tables as their field values (i.e. the type).

It's not like this stuff is actually slow, right?

Note: Overall, it would be interesting to understand if a compiler or
runtime of Shine may somehow provide hints to LuaJIT to enable better
optimizations. I understand that currently LuaJIT does not provide
support for this. But if it would, what could be done? This kind of
input could interesting for Mile Pall to convince him that supporting
some sort of hints could be very valuable.

It's been suggested before on the mailing list to make LuaJIT more
friendly as a compilation target for other languages. I can't remember
Mike's response exactly, but it didn't seem to be a goal of his. It's
actually hard to erase the underlying Lua semantics. For instance, in
many languages 0 and the empty string evaluate to false. Not so in
Lua. So if you wanted that, you'd have to wrap every boolean context
evaluation in your own function call.

If your language has mutable strings, then you'd need to replace the
native string type and half the VM instructions become useless to you.
Then there's the GC. If you wanted to have real low level classes and
objects (not based on tables, but language primitives as in Java), then
you'd pretty much need to do your own tracing and garbage collection
because you don't have mark and sweep hooks.

So it's actually a fairly non-trivial job to make the LuaJIT VM generic
in this way.

Since Lua is very dynamic, I can imagine that Shine compiler could
potentially only statically make use of the fixed layout only in the
scope where a variable was declared. But when it gets passed to a
different function (or another scope, in general), then this statically
known information is most likely lost (unless some sort of very clever
escape analysis is used). I don't know if LuaJIT can do any better using
traces and propagating this information along traces.

Then struct declarations would need to be added to the language, and
not be part of a library (they're just functions). Right now the
compiler just sees a function call.

An interesting idea would definitely be to integrate the FFI more into
the language so that C declarations are parsed directly, but with Shine
syntax. I've toyed with this too:

struct Buffer
   size is size_t
   data is char*
end

You'd then get the ffi.cdef stuff done for you behind the scenes.

This gets hairy though, because what does char* mean in the rest of
the language? Can I call it? What if I want to cast it? If casts have
their own syntax, what if I try to cast a table? Tricky stuff.

Right now the fact that you need to pass strings to ffi.cdef and do
all the operations via a library makes it clear that you're dealing with
stuff that's external to the language (i.e. the foreign in ffi).

As I said, these are just some ideas. I'm not sure how feasible they are
and if something like this is possible at all.

What do you think?

Sure, all good ideas. For now though, I feel that Shine is squeezing
about as much out of LuaJIT as a sugar language can hope to, without
going down a complexity rabbit hole.

romix · Answer 2 · Mon Apr 07 2014 16:52:44 GMT+0800 (China Standard Time)

Sure, all good ideas. For now though, I feel that Shine is squeezing about as much out of LuaJIT
as a sugar language can hope to, without going down a complexity rabbit hole.

Absolutely! It was not proposed as a high-priority feature for a near future.
I just wanted to have it mentioned for later, should we ever detect serious performance problems with classes/structs.

And of course I'm interested in understanding how to use LuaJIT as a target for other languages. In particular for things that have a semantic diverging from Lua's semantics. And for things that may be CPU-intensive, i.e. where certain optimizations (or lack of them) start playing an important role... But this is of course out of scope for Shine, at least in its current development stage. Concentrating on optimizations makes sense (if at all), once the language has stabilized and performance becomes an issue...

It's been suggested before on the mailing list to make LuaJIT more friendly as a compilation target
for other languages. I can't remember Mike's response exactly, but it didn't seem to be a goal of
his.

It was actually me who asked for it ;-) I still think that LuaJIT could be a wonderful target for many languages if it would be a bit more friendly... I haven't given up on this idea yet. But this is another story ...

Richard Hundt · Answer 3 · Mon Apr 07 2014 17:19:43 GMT+0800 (China Standard Time)

On 4/7/14 10:52 AM, romix wrote:

It was actually me who asked for it ;-) I still think that LuaJIT could
be a wonderful target for many languages if it would be a bit more
friendly... I haven't given up on this idea yet. But this is another
story ...

If you really wanted to diverge, then you actually could get pretty
far. I would start by implementing my own tagged value types, so the
language has its own idea of strings, numbers, booleans, arrays, hashes,
classes, namespaces etc., and it would use functions as "basic blocks".

Then I'd move all my VM "instructions" into a C library and use only
branching and CALL instructions which call into said C library using the
FFI.

So a = b + c becomes: a = op.add(b, c) where a, b and c are
all your own language's tagged values, and a = foo.bar() would become
a = op.callmeth(foo, 'bar'), and so on.

With this scheme you're still using the stack slots and GC roots, but
you get control over everything else. Dispatching to a C function via
the FFI on my system has about a 2 nanosecond overhead. Half a billion
ops/second is probably fast enough (plain LuaJIT will still blow it out
of the water though).

Then the only problem you have is for non-scalar types: arrays, hashes
and so forth. For mallocd data, __gc hook set via ffi.gc doesn't
actually free anything (obviously), but it can be used to implement a
simple reference counting scheme. You'd only be notified if the
reference is no longer anchored. Then, if it is an array, for example,
you'd traverse it yourself and decrement the refs on its elements
recursively, freeing those which hit zero. The added bonus for this is
that if you use an mmap based malloc (jemalloc, for instance), then you
don't hit the 1GB limitation on x64.

Coroutines could get tricky, but it's still doable.

One of the main reasons I didn't go this route is because I wanted to be
able to leverage existing Lua libraries. Starting an ecosystem around an
entirely new language is hard if there aren't any libraries you can use
(of course, you'd still have access to C libraries via the FFI).