domenic / proposal-blocks

Former home of a proposal for a new syntactic construct for serializable blocks of JavaScript code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handling globals

domenic opened this issue · comments

The proposal says that

the implementation checks that the code does not reference any bindings from outside the blöck

But the very simplest example violates this:

const result = await worker({|
  const res = await fetch("people.json"); // !!!!
  const json = await res.json();

  return json[2].firstName;
|});

The marked line references the fetch binding, which happens to resolve to a global variable.

The intent here is really that we use the worker's global fetch(). But I'm not sure how to make this rigorous, while still keeping the desirable property that non-globals need to be explicitly captured, and if not will cause parse failures.

The simplest fix I can think of is to abandon the parse errors on bad bindings :(. But that gets rid of one of the main advantages.

A more complicated fix would be that maybe worker can declare a list of allowed implicit bindings, and then we'd set it to basically all the globals present in worker global scopes? Seems rough.

Not sure what to do here.

Yeah, I suspect what you want to do is whitelist the list of built-in globals that are expected to be present across all the contexts blöcks will be passable to.

(This would imply, however, that either (a) fetch can't be on that list, or (b) you can't ever use a blöck in a worklet...)

Doing that, however, might be an issue with globals that have been overwritten/deleted/polyfilled/added (babelHelpers, shims, etc)

It's an "issue" in that I assume the blöck would automatically reference the primordial version, and pay no attention to mutations.

But yeah, then you can't polyfill in new functionality.

commented

explicitly overriding globals could be done if you want to polyfill and i imagine with the current staticness of the syntax babel could pretty easily understand what needs to be changed to inject its own stuff in.

@devsnek it looks like it'd also require the wrapper function to be changed, and that might live in another file, so i think it'd be outside of babel's scope.

There should be some way to check if block has global references to manually resolve them before transfer. Each block should be packed on sender side and unpacked on a receiver side. It will make block behavior predictable, controllable and testable. And it will be possible to send blocks by network to execute in distributed environments without redistributing common code.

You could require fetch to be passed in as an argument, but that would get messy fast.

Alternatively, there could be some kind of import syntax for the global:

const result = await worker({|
  import { fetch } from 'global';

  const res = await fetch("people.json");
  // …
|});

reify could throw if the imports fail for that particular global.

I think something that requires explicitly passing in the globals loses ergonomics compared to the stringify-a-function-body versions. So I lean toward either removing the parsing errors or providing a way for the function (worker() in this example) to state what bindings are always available.

if there was a way to combine the block with the function such that it’d be an error to create it using bindings that weren’t always available inside the worker, that might be a nice balance?

Not sure what you mean by "combine the block with the function", but let me be a bit more concrete with what I was thinking with regard to stating what bindings are available.

Notably, it would no longer have parse-time errors in any sense; they would be runtime errors. But, they would be runtime errors at the time of the call to worker(), and not at the time of the call to reify(), which is a bit nicer...

function worker(...) { /* as in readme */ }
worker.allowedBindings = ["fetch"];

// (1)
worker{| return fetch("foo.json"); |}

// (2)
worker{| return document.createElement("div"); |};

After seeing this block of code the parser has assembled information saying that the block at (1) has unbound fetch and the block at (2) has unbound document.

When line (1) executes, the first thing the runtime semantics for tagged blocks does is compare its saved list of unbound bindings with worker.allowedBindings. It finds that the set is a subset, so it's good to go.

When line (2) executes, the same thing happens. But this time document is not found in allowedBindings so the call immediately throws without ever creating a blöck.


I'm not sure building this check in to the language is quite worth it. Especially since it's not a static check that tooling can detect in a safe way. (Tooling could use heuristics like "I know worker{||} usually means the browser's built-in function, which has these allowed bindings". But tooling could also do that without us building this check into the language.)

I do think it's worthwhile to catch these sorts of errors as early as possible, and finding them at time of "calling worker()" is better than at time of reify(), or worse, at time of "undefined is not a function".

Another option: be strict about the "no captured variables" thing, including globals, but define this in the context of the blöck to be the global object of the worker. So you'd write this.fetch() instead of fetch.

It's still less ergonomic, but you'd get static errors telling you if you did it wrong.

I think that's my favorite version yet. Probably not using "this" since that's too easy to shadow and I have a long-standing dislike for using it in non-method contexts, but just asking people to qualify their globals seems really solid.

The nice things about this are that 1) it is already the global object in scripts, so there's precedent, and 2) it's already a binding which changes value depending on context, so there's less risk of it looking like it means something else (which would be a problem if we did something like let global = { fetch: ()=>{} }; await worker({| return global.fetch('foo'); |});).

But other names could be made to work too, I expect.

You'll probably end up destructuring the global every time, but I guess that's ok:

await worker({|
  let { fetch, Uint8Array, Map, Set, ReadableStream } = self;
  // …
|});

However, doesn't that mean every imported library will have to do the same? I guess you'd need a build step to rewrite every global reference to prepend self. or add the appropriate destructuring.

I quite like the destructuring though because it is clearer about what parts of the worker's API are being accessed. I don't necessarily think that could be tedious. It really is up to the developer how complex a blöck can/should get and how many parts of the worker API a certain blöck uses anyway.

I'm not really worried about it being tedious, I'm more worried about the ease of using existing libraries.

Eg, if a library I import uses new Uint8Array(500), it'll fail. I'll have to modify the library to use new self.Uint8Array(500), or let { Uint8Array } = self at the start of its module.

Imported libraries don't have their bindings checked; only the code syntactically inside the blöck is checked. So imported libraries would not need to change.

I guess that's kinda obvious now you say it 🙃

Is it essential to statically catch these errors as early errors? Maybe if the Worker is run in a mode that will forward the ReferenceError back up to the spawn-er in a visible way, it would be enough. Also, type systems and maybe advanced linters could catch these errors.

I think removing the early errors is an option, but it decreases the utility.

It's 2019 now, and the obvious thing to do would be to only allow globalThis as an implicit global variable.

await worker({|
  let { fetch, Uint8Array, Map, Set, ReadableStream } = globalThis;
  // …
|});

That seems fine.

In another issue on this repo I just suggested an alternate syntax:

const result = await worker({endpoint} => {
  const res = await globalThis.fetch(endpoint);
  const json = await res.json();

  return json[2].firstName;
});

Yeah. That feels about right.