ballercat / walt

:zap: Walt is a JavaScript-like syntax for WebAssembly text format :zap:

Home Page:https://ballercat.github.io/walt/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Syntactic sugar: char literal that automatically converts the unicode value to a u32

JobLeonard opened this issue · comments

Feature Request

Overview

Using a full-blown string in WASM is overkill if you just want to compare a single character to another value. A char literal would help.

Impact

Medium. It's very convenient - without it one would have to resort to manually inserting unicode values.

Due Date

How hard is it to implement? I don't know the Walt compiler stack. But if it's in JavaScript, then it should boil down to identifying a single-character, calling codePointAt(0) to get the integer value, and ensuring it is assigned to an i32, no?

How hard is it to implement? I don't know the Walt compiler stack. But if it's in JavaScript, then it should boil down to identifying a single-character, calling codePointAt(0) to get the integer value, and ensuring it is assigned to an i32, no?

Yes, this is literally what it would take. The AST node would be a i32 Constant.

To fully answer the question, it wouldn't be hard at all. String literals are already parsed, but they aren't' generated into anything, they used to be used for object properties and can still be used as array subscripts into an object. Literals with a single character would need to be converted to an i32.

Cool. So you are considering accepting the feature request? :)

Actually, I thought of a similar low-level approach for introducing strings too: as syntactic sugar for i32[] arrays. But I started overthinking it so I opened up a separate issue for that: #104

Still, the discussion of strings is intertwined with that of chars: assuming strings will be introduced, I think that it would be better to be explicit about distinguishing a char from a single-character string, for a few reasons:

  • people coming from JavaScript might not be aware of the concept of single chars vs full Strings, which could lead to confusion
  • it might not be immediately obvious that a variable is an i32 (for example, when reading someone else's code and the variable was declared much earlier), and therefore distinguish between assigning a single-char string or a char (assuming strings are introduced with similar syntax)
  • continuing the assumption of strings introducing the same syntax: this would introduce ambiguity once type inference is added: is let s = 'x' a string or a char?

One option for distinguishing them is the "single-quote vs double quote" convention from C, but I think sticking to JavaScript conventions is better: as I understand, Walt is less about recruiting C programmers to JavaScript (besides, they could just use any of the big compiles-to-WASM toolchains), and more about letting with mainly a JavaScript background do low-level WASM programming more easily. So accommodating the latter's expectations has a higher priority.

Instead I figured: why not a tagged template?

let char: i32 = ch`x`;
// with type inference
let char2 = ch`y`;

And here is all of the JS code required for that tagged template to return the code point:

function ch(char) {
	return char[0].codePointAt(0);
}
// examplee:
ch`H`; // 72
ch`e`; // 101 
ch`l`; // 108
ch`l`; // 108
ch`o`; // 111
ch`,`; // 44
ch` `; // 32
ch`世`; // 19990
ch`界`; // 30028

Pretty straightforward, no?

I like it! I commented on the other issue that I think a template literal is the right approach to bridge the gap between JS <> WASM. I would probably call the template literal char though 👍

Maybe I spoke too soon.

If the right-hand side of this assignment is ani32 and not an i32[] then we could infer that it's a single char instead of a string.

If the right-hand side of this assignment is an i32 and not an i32[] then we could infer that it's a single char instead of a string.

I'm not sure I follow? are you saying, in pseudocode, type = string length === 1 ? char : string? That was always the case, yes, but my issue is this:

function oops(): i32 {
  let a = 'a';
  let b = 'foo';
  functionExpectingTwoStrings(a, b);
}
function functionExpectingTwoStrings(str1: i32[], str2: i32[]): i32 {
  // something. Point is that this is called with str1 = i32
}

My worry is that this goes against expectations, since JavaScript doesn't have real chars. Also someone might actually want a single-character string for some purpose. I'm not sure what purpose, but it will happen!

This edge-case you point is a thing that may happen! It get's a bit trickier with type inference, because it adds ambiguity.

What I'm talking about is based on these rules

assignment -> identifier : type = string | char;

where identifier:type is the RHS and string | char is the possible LHS. i32 RHS type means it's a character encoding for LHS. Where an i32[] RHS type is a string encoding. This is the similar logic as when assigning to a number type currently.

The above example would induce a warning, the code would not compile. The 'a' character would cause a identifier to be an i32 type which would not match the types of functionExpectingTwoStrings.

My worry is that this goes against expectations, since JavaScript doesn't have real chars. Also someone might actually want a single-character string for some purpose. I'm not sure what purpose, but it will happen!

True, but the developer could fix it with a i32[] type.

Oh right, I forgot that Walt has static type support :P

I prefer the syntax. JavaScript reserves the keyword char and it makes more sense as a type:

let char : char = "a";

Where:

let char : char = "ab";

Will throw a SyntaxError

Will throw a SyntaxError

It's more a TypeError, the syntax is fine.

I had this confusion as well. Actually this is a syntax error because the syntax for char can not be two chars.

It's maybe an implementation detail but it's simpler IMO to add a rule in the type inference.

According to the definition above,

assignment -> identifier : type = string | char;

The type is independent of what's on the right side, so that would require a condition in the parser and you wouldn't be able to catch the error in:

function f(c: char) {}

f("ab")
f("a")

I wouldn't call "catching more errors" an implementation detail

I wouldn't call "catching more errors" an implementation detail

Yes, I meant that it's a type inference concern (once you have the AST basically) instead of a grammar one

Got it.

How about starting from desired behaviour and working our way back to the correct "implementation detail"?

A char should ideally:

  • catch the error you mentioned at compile time
  • resolve the ambiguity issue I mentioned, in an obvious way for the programmer
  • play nicely with type inference
  • be easy to concatenate with strings
  • be easy to manipulate the numerical value of (since that is part of the point of low level programming)

So for example:

let a : char = 'a';
// let's assume we end up with a special string type as well
let b : u16[] = 0
b = 'beeee';

let x : i32 = 42;
let y : i32[] = 5; 

// assuming type inference, what happens?
// should this concatenate?
let c = b + a;
let d = b + x
// should this be u16 "a**", char "²", i32 178?
let e = a + x + x;

By the way, a tagged literal for chars as a JS -> WASM conversion might still have a use in terms of ease of use. Still, if there is a situation where calling a WASM function for dealing with a char is worth it, adding an extra function call and array look-up might become an issue:

    // (where "foo" is a wasm function that expects two chars "char")
    // two function calls, one array look-up per parameter
    foo(char`a`, char`b`);
    // one function call per parameter
    foo('a'.codePointAt(0));

I think that at this point, this issue overlaps too much with #104 - whatever will be decided there for how to handle strings will more or less determine how to handle chars. Should I close this?

While using a single quote around a single character might be confusing for newcomers I think I'm going to go with that for at least version zero of implementing character literals. Complete type support, with string and char, could help with that aspect in the future. I've been experimenting with the idea and not a huge fan of template literal for a single character, it's also extra effort to parse, while single character literals could just be transpiled already 😂

But chars are not part of Javascript... :-/

Yes, they are not. But this is

someChar == 'a'

without them, we have to write this instead

someChar == 97

Doing the manual conversion to a number here is fine, but it get's old quick. Even C had character literals way back when. There are tradeoffs here, but seems like not that many compared to the added benefit.

Yeah, I find it easier, but just only wanted to stick to pure Javascript... It's ok for me if it's just as a development help or an internal optimization as far it's not exposed to user space keeping it as an implementation detail (or exposed using a special flag for the ones that want to play with it).

but just only wanted to stick to pure Javascript

Could you tell me more about this? What do you mean by pure JavaScript, I think this is where I'm missing your point.

i32 type is not pure JavaScript either. Types should be environment specific. The browser defines global types - there's no reason Walt won't define some itself.

What do you mean by pure JavaScript, I think this is where I'm missing your point.

As I remember, originally the project was to translate as much as possible Javascript syntax to WebAssembly, to be able to do low-level programming with plain Javascript. Typing is ok and make sense in that context, but char and other similar things are out of that concept of being walt a subset of Javascript the same way it was asm.js...

The browser defines global types - there's no reason Walt won't define some itself.

I think you are confusing here what's Javascript and what are web APIs... What's possible to be, is that we do an implementation of EcmaScript (like Javascript, ActionScript, JScript...), but in that case walt would not be Javascript or a subset of Javascript but just "another thing", that's not bad either if things are clear...

Character literals have been available for a while now.