ballercat / walt

:zap: Walt is a JavaScript-like syntax for WebAssembly text format :zap:

Home Page:https://ballercat.github.io/walt/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data section offsets

ballercat opened this issue · comments

Problem

There are currently two methods of encoding a Data Section entry into the binary. These are

  • const hello: i32 = 'Hello World!'; - static strings
  • const array: i32[] = [1, 2, 3, 4]; - static array/raw data regions

In both cases the data is encoded into the data section such that it'll take up the first available offset in memory.

The data sections in the WASM spec allow for an explicit offset. An example of this can be seen in the reference spec tests for data section here.

Goal

Define and implement a syntax for allowing an explicit offset to be defined when defining data sections in Walt.

Possible syntax:

A pseudo function call

const memory: Memory = { initial: 1 };
const string: i32 = memory.data(1024 /* offset */, 'Hello World!' /* value */);

This would require for altering the grammar to allow for top-level function calls, as well as some guards on calling non memory.data() function calls.

A static object

const string: i32 = { offset: 1024, value: 'Hello World!' };

This would work great for strings but not so well for static arrays. Also, having an object property(offset) not be part of the object is very odd.

???

Maybe there is an additional way to define this which would make sense, but it does seem like having the memory involved in some way makes the most sense. Especially since in the future it will be possible to have N > 1 memories in a single binary.

How about just continue the pointer like concept?

const string: i32 = 1024;
string[] = 'Hello World!';

const array: i32[] = 2024;
array[] = [1, 2, 3];

regarding the multiple memories, how about just a reference?

const string: i32 = &someMemory[1024];
string[] = 'Hello World!';

Here's a suggestion that involves changing the parser:

const hello1: i32 = 'Hello World!'@1024#1; // static string at position 1024 of memory # 1
const hello1b: i32 = 'Hello World!'@1024; // static string at position 1024 of memory # 1 as well
const array: i32[] = [1, 2, 3, 4]@4096#7; // static array/raw data region of memory # 7

Instead, maybe it's better to have offset info near the type declaration, like this:

const hello1: i32@1024#1;
hello1 = 'Hello World!'; // static string at position 1024 of memory # 1

const hello1b: i32@1024;
hello1b = 'Hello World!'; // static string at position 1024 of memory # 1 as well

const array: i32[]@4096#7;
array = [1, 2, 3, 4]; // static array/raw data region of memory # 7

And here's another suggestion...

const memory: Memory = Memory.allocate({ initial: 1, number: 1 });
const hello32: i32[] = memory.view({offset: 40}); // starts at position 40
const array64: i64[] = memory.view(); // starts at 0

hello32 = 'Hello!';

array64 = [1, 2, 3, 4];

This project takes a low level approach, but for strings and arrays, I think this C-style practice of "we'll track the offset as an integer, you're responsible for the length" is just a tiny bit too low-level and is going to elicit a WTF reaction among the project's intended users, who aren't familiar with the C approach. Tracking lengths manually is a burden familiar to C, but alien to JS. There should be sugar for this, or JS programmers are going to reinvent zero-terminated strings.

Maybe there should be five special higher-level sugary types available for use: slice_i32, slice_i64, slice_i64, slice_f64, and string. (Or some other names like array_i32, vector_i32, list_i32, etc.) These could basically be implemented as a standardized set of structs, with first-class syntax support, which programmers would be encouraged to use instead of hacking up their own structs. The only difference between slice_i32 and string would be that string supports string literal syntax like i32 currently does; otherwise they could be implemented identically in WASM.

Each of these types would be a struct holding a pair of scalars: an i64 length and an i32[], i64[], f32[], or f64[] pointer. It doesn't need to bother enforcing an index range (what would it do anyway?) so a programmer should be able to ignore a length if he wants. But it should be made available, at least, whether it's mutable or not, or people will get upset.

So it would look like this:

// Offset is a byte offset, length is number of elements,
// in keeping with ArrayBuffer/TypedArray syntax
const memory: Memory = Memory.allocate({ initial: 1, number: 1});
const mySlice: slice_i32 = memory.view({offset: 1024, length: 10}); // elifarly suggested syntax
const length: i64 = mySlice.length; 
log(length); // prints 10
const primitiveArray: i32[] = mySlice.offset; // explicit syntax
// OR
const primitiveArray: i32[] = mySlice; // coercive syntax?
// THEN... mySlice can support bracket-syntax?
mySlice[0] = 42;
primitiveArray[1] = 666;
log(primitiveArray[0]); // prints 42;
log(mySlice[1]); // prints 666;

I like it, but how about some sugar where both mean the same thing

const pointer: slice_i32 = memory.view({offset: 1024, length: 1});
const pointer: i32* = &memory[1024];

The first one is a pointer to a struct having a pointer and a length. It basically maps to this:

type Slice32Type = { 'array': i32[], 'length': i32 };  // "sugarless" type
const foo: Slice32Type = memory.view32({array: 1024, length: 1});

(Instead of memory.view(), two methods called memory.view32() and memory.view64() would indicate the element width.)
A sugarless type wouldn't be equipped to handle bracket notation, but the compiler could support it with a sugary type. So you could do foo[3] instead of having to say foo.array[3].

This is how Go does it. There's an array type in Go but they don't want you to pay attention to it. They want you to use their slice type everywhere instead. It's similar to ArrayBuffer and TypedArrays in JS.

In the second, there is no struct or length information, just a pointer. It would map to this I think-

const pointer: i32[] = 1024;

The ampersand might be useful if it can be prepended to things other than memory. But dereference and address operators are not in JS.

I've been thinking about this a bunch and mostly the bits about the fact that we would need a sugary type for strings/slices. The way things are shaping out, it seems like the existing syntax + types are not expressive enough.

I don't think I want to keep adding sugary types without allowing the user to do something similar (w/o compiler changes). Even though adding a new type to the compiler directly would be very trivial. I think (and this idea isn't fully baked yet) I'll have to pivot on the types a bit and expose operator overloading (among other things) to the user, so that things like string or slice could be implemented. Better primitives would also make the topic of the issue easier to implement IMO.

In some ways it would be better (less magic in the compiler), in some other ways it would be worse since the syntax would be even farther from JS/flow, at least when dealing with types. Then again, most of this stuff can be built-in/std-lib-ed so that most users don't need to worry about it.

For example indexing into a string could be done via better primitives like so

type String = ({ length: i32 }, i32); // can be used as an i32 or object with field .length
// syntax TBD
// " -> { " denotes a Block not a function, kind of important distinction
operator String[] = (target : String, index : i32) -> {
    // do t.length sanity check perhaps
    return i32.load(t + ((1 + index) << 2));
};

Current array logic would also be "implemented" in the same way, technically it already is, just inside the compiler not in user-land.

Along these lines. The goal at the end of the day is to allow the user to use wasm however. And as much as I'd like to avoid creating a new language/type system there seems to be no way around it without limiting usability.

I'll likely open up a new discussion on the topic as it's a bigger issue than "how to set data sections".

As you can see with with my comment history, I never really cared about the "seems like js" feature. I think C is close enough, and this is why I proposed C like solutions. We have a C like memory management problem here!

As for your operator/block idea, you can simplify the syntax by not implementing an operator keyword, but predefining operators as syntax sugar that maps to a named function.

For example,

function array_accessor(target : String, index : i32) {
    // do t.length sanity check perhaps
    return i32.load(t + ((1 + index) << 2));
}
const foo: String = 'bar';
foo[2] is just sugar for array_accessor(foo, 2)

But this would imply you need to implement function overloading or generics function array_accessor<T>(target: T, index i32) { yikes

Hacky idea: preprocessor directives at the top that include type definitions:

#import 'stdlib.walt';
#import 'memory-manager.walt';

Or you could go really nuts with a module system for the compiler that makes it like Babel:

const walt = require('walt-compiler');
const unsigned = require('walt-unsigned-types');
const pointerSyntax = require('walt-pointer-syntax');
walt.compile(`
      export function test(n: u32): void {
        const array: u32[] = &n;
        n[0] = x * y;
      }
`, [unsigned, pointerSyntax]);

Yup, the compiler already supports extensions. All of the current features are written as internal (enabled by default) language extensions and grammar. There is a reference implementation of closures as a plugin to demonstrate how a complex extension could be made and injected into a compiler.

I'll probably make a package (similar to babel presents) for all experimental-* features for these type of changes.