onyx-lang / onyx

✨ The compiler and developer toolchain for Onyx

Home Page:https://onyxlang.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Smaller modules

jb55 opened this issue · comments

I noticed the entire runtime gets linked in even if we don't use anything from it. It would be nice if onyx was smart enough to only include things it uses in the wasm module. With AssemblyScript, modules can compile down to a few hundred bytes, importing only what is needed.

Here's an example from an AssemblyScript-compiled binary I'm using for an embedded scripting language in an app I'm working on.

bool_setting.wasm:	file format wasm 0x1

Section Details:

Type[4]:
 - type[0] (i32, i32, i32, i32) -> nil
 - type[1] (i32) -> i32
 - type[2] (i32, i32, i32) -> i32
 - type[3] () -> nil
Import[2]:
 - func[0] sig=0 <env.abort> <- env.abort
 - func[1] sig=2 <nostr.nostr_set_bool> <- nostr.nostr_set_bool
Function[2]:
 - func[2] sig=1
 - func[3] sig=3
Memory[1]:
 - memory[0] pages: initial=1
Global[2]:
 - global[0] i32 mutable=1 - init i32=1088
 - global[1] i32 mutable=1 - init i32=0
Export[1]:
 - memory[0] -> "memory"
Start:
 - start function: 3
DataCount:
 - data count: 10
Code[2]:
 - func[2] size=200
 - func[3] size=197
Data[10]:
 - segment[0] memory=0 size=1 - init i32=1036
  - 000040c: 1c                                       .
 - segment[1] memory=0 size=19 - init i32=1048
  - 0000418: 0200 0000 0c00 0000 6d00 6e00 7900 6000  ........m.n.y.`.
  - 0000428: 6f00 72                                  o.r
 - segment[2] memory=0 size=1 - init i32=1068
  - 000042c: 1c                                       .
 - segment[3] memory=0 size=1 - init i32=1080
  - 0000438: 02                                       .
 - segment[4] memory=0 size=1 - init i32=1100
  - 000044c: 3c                                       <
 - segment[5] memory=0 size=47 - init i32=1112
  - 0000458: 0200 0000 2800 0000 4100 6c00 6c00 6f00  ....(...A.l.l.o.
  - 0000468: 6300 6100 7400 6900 6f00 6e00 2000 7400  c.a.t.i.o.n. .t.
  - 0000478: 6f00 6f00 2000 6c00 6100 7200 6700 65    o.o. .l.a.r.g.e
 - segment[6] memory=0 size=1 - init i32=1164
  - 000048c: 3c                                       <
 - segment[7] memory=0 size=37 - init i32=1176
  - 0000498: 0200 0000 1e00 0000 7e00 6c00 6900 6200  ........~.l.i.b.
  - 00004a8: 2f00 7200 7400 2f00 7300 7400 7500 6200  /.r.t./.s.t.u.b.
  - 00004b8: 2e00 7400 73                             ..t.s
 - segment[8] memory=0 size=1 - init i32=1228
  - 00004cc: 1c                                       .
 - segment[9] memory=0 size=19 - init i32=1240
  - 00004d8: 0200 0000 0c00 0000 7300 6800 6d00 6f00  ........s.h.m.o.
  - 00004e8: 7200 67  

using a "simpler" onyx program:

use core { iter }

main :: () {
    for i: 1 .. 10 {
        fact := factorial(i);
    }
}

factorial :: (n: i32) -> i32 {
    return iter.as_iter(1 .. n)
        |> iter.fold(1, (x, y) => x * y);
}

The output is very large, and can't be used for embedded use cases:

https://cdn.jb55.com/s/92a683de3fdcdf89.txt

I'm sure you're thinking about this, but thought I'd open an issue for it.

This is an issue I have thought about. A lot part of the binary currently is all of the type information that can be used for reflection. Things like printf use it heavily. However, it does add a large overhead to the executable size. If you run the compile with -V, you can see the type table size. For me, it is usually around ~170Kb, depending on how many types are in the program.

If you know your projects does not need type information or reflection, you can use the flag --no-type-info to onyx build or onyx run to omit outputting that info.

Also, one optimization I have not done yet is proper tree-shaking, or removing all the functions that are impossible to reach. Once I implement that, the binary size should decrease dramatically.

Thanks for trying out Onyx and I hope this helps!

While it is far from perfect, Onyx does now perform tree-shaking before outputting the final binary, at least to the best of its abilities. This does help drastically reduce the size of the outputted binaries, especially in term of the number of function and data elements.

It is nowhere near as minimal as AssemblyScript though, because a couple features in the standard library are quite large and have internal dependencies on each other, such as the heap implementation and formatted printing. Because of this, the binary for the above code example with --no-type-info provided is still around 37Kb right now, which is far from ideal.

Just wanted to provide an update as there has been progress on this issue!

See #95 for more details.