tc39 / ecma262

Status, process, and documents for ECMA-262

Home Page:https://tc39.es/ecma262/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Built-in Modules

bterlson opened this issue · comments

Built-in modules come up repeatedly around the various proposals. I am making this issue to centralize the discussion such that champions of the eventual proposal have a good central location for information. I will keep this up-to-date if there is further discussion in this issue.

Existing Discussion

Syntax Options

Naming convention inside module specifier

This option entails establishing a naming convention for built-in modules. Has to be compatible with existing ecosystems, eg. we cannot clobber an npm or standard node package name.

Strawman: *-sigil

import "*SIMD";
import {float32x4} from "*SIMD";
import SIMD from "*SIMD";

Strawman: URL scheme

import { float32x4 } SIMD from "std:SIMD";
Distinct syntax for built-in module imports

This option necessitates additional reflective capabilities in the Loader API to request built-in modules by name as opposed to going through the normal module resolution process.

Strawman: IdentifierName instead of StringLiteral

import SIMD;
import {float32x4} from SIMD;
import SIMD from SIMD;

Semantics Requirements

Must defer to loader to resolve built-in modules (important for polyfillability). Loader may see either a special string or possibly an options record indicating that a built-in module is requested.

Just to gather a few constraints:

  • The aesthetics matter a lot, since importing standard libraries is something we expect to happen in an extremely large number of modules.
  • We have a choice between just establishing a naming convention vs. reserving syntax (whether it's different surface syntax or distinguished module identifiers in the string literal).
    • If we choose a naming convention, it has to be compatible with existing ecosystems; in particular we have to watch out for clobbering an npm or standard node package name.
    • If we choose the latter, it will require additional surface area to the reflective layer in the loader API.
    • In particular, anything we do has to allow standard modules to be polyfillable/prollyfillable, which means the registry and/or (probably and) the loader pipeline have to be able to expose distinguished standard modules.

Edit: I see @bterlson already mentioned the exposing-to-the-loader constraint, apologies for the dup.

@bterlson Thanks for starting this discussion. I think it's very important for TC39 to set some kind of precedent here since platforms (e.g. DOM) will also likely want to start putting things into their own "standard" modules.

To throw another strawman out there, could we not also use a URI scheme for this purpose?

import { float32x4 } SIMD from "std:SIMD";

I'm concerned about this form:

import SIMD;

since it would presumably have non-local effects (by adding properties to the global object). If the bindings are local instead, I would reject to it on the same grounds as import * from.

(Updated OP with new information)

I'm concerned about this form: import SIMD;

The way I see it this form is simply a shortcut for import SIMD from SIMD. There are no properties added to the global, SIMD is bound in the module environment record as normal imports are.

To clarify the polyfill scenario: I must be able to write code that can mutate, overwrite, create, or freeze, a built-in module, just like I can do right now with a built-in global. Mutation/overwriting is for when engines inevitably ship bugs, and shims want to fix them - freezing is for things like SES that want guarantees that nobody can maliciously mutate/overwrite builtin modules later - creating is to provide new modules in older environments.

I also agree that whatever precedent we set should pave a cowpath for engines to add non-language-builtin builtin modules in a non-colliding way, but that ensure the same capabilities I mentioned in the previous paragraph.

Also, the IdentifierName variant

import {float32x4} from SIMD;

would be future-hostile to lexical modules, FWIW. And in general, I think it's "lexically surprising" to see an identifier in that position referring to something that's not in scope.

The way I see it this form is simply a shortcut for import SIMD from SIMD

For consistency, users need to write that as:

import * as SIMD from "<whatever>";

We should try not to special-case or give special meaning to forms which import built-ins.

Sorry, or

import SIMD from "<whatever>";

if it exports an default.

And in general, I think it's "lexically surprising" to see an identifier in that position referring to something that's not in scope.

I agree with this.

Of the positions so far I like a "std:" prefix the most. It reuses an existing part of the module resolution space (absolute URLs) in a way that can't conflict (std: is not a valid URL scheme today).

@ljharb

To clarify the polyfill scenario: I must be able to write code that can mutate, overwrite, or freeze, a built-in module, just like I can do right now with a built-in global. Mutation/overwriting is for when engines inevitably ship bugs, and shims want to fix them - freezing is for things like SES that want guarantees that nobody can maliciously mutate/overwrite builtin modules later.

You can't really mutate a module or a named export you are importing, in the case of modules, it is likely to be just a shimming process via export and export-from syntax. We have all the pieces in place to support this use-case today.

@caridy Maybe @ljharb just means replace the registry entry, which definitely is a requirement for polyfilling. (This is why it was important that even though modules cannot themselves be mutated from the outside, we still made it possible to mutate the registry.)

Exactly that, yes ^ sorry that wasn't clear.

Another problem with the IdentifierName syntax for module specifiers is that it is hostile to existing tools that already parse import statements. Keeping the string syntax means that tools that don't care about the actual semantics of the import don't need to change.

@ljharb
Regarding "freezing", any actual object values exported by a module can certainly be frozen using the usual techniques.

From a core language perspective, I think very little would have to be said about the semantics of supporting built-module shimming. Basically, an import of a module identified using a built-in module designator that is recognized by an implementation uses the built-in implementation unless the active module loader has explicitly registered an interest in handling that module. Unrecognized built-in modules and those that the loader has registered for over-rides are simply passed on the the loader.

Loader APIs of course have to provide for registering built-in over-rides. But a simple implementation that only supports a single built-in loader doesn't need to even worry about that case.

BTW, I also like: "std:SIMD" as long as we are confident that we can safely use "std:" without tripping over any other URI protocol.

I only threw out "*SIMD" (with an escapable *) as a strawman in anticipation that there might be concerns about conflicts with things like "std:".

@allenwb i meant, freezing it in the registry so further changes to "what gets imported" are impossible, ie, not just Object.freeze on the export, but Module.freeze on the registry entry, or similar.

I would only hope that whatever the implementation, that consumption/publication of CJS/Node and ES6 modules can interoperate... perhaps following browserify and webpack's logic in this regard.

Ir at least some awareness...

@ljharb

i meant, freezing it in the registry

ok, then it seems like an orthogonal issue in the design of the module loader API and really doesn't impact the idea of specifying a way to designate standard built-in modules.

I would think that for SES purposes, the ability to lock down a builtin module from being replaced in the registry is a blocker - @erights?

Let me try to be clearer. In the absence of a standardized or implementation provided module loader that exposes a module registry there is no way to replace (or lock down) a built-in module. So, from the perspective of the core semantics of import this is a non-issue and shouldn't stand in the way of specifying built-in modules.

Certainly browsers and most other significant implementation will provide such capabilities so the module loader specifications needs to address it. But it shouldn't be a blocker that forces us to avoid defining built-in modules.

I also like the module specifier "std:SIMD". IANA doesn't have any "std" scheme registered which is a good sign but of course there could be unregistered usage out there.

What about using :SIMD instead, then we don't have to worry about "std" being a valid scheme.

Just esthetically, the leading colon looks pleasing to me. And as @nbdd0121 observes, we don't need to worry about the empty string becoming a valid scheme name.

Do any known file systems assign meaning to a leading ":" in file/path/device names?

Over course, if we are worried about running into that, we could do "::" escaping like I suggested for "*".

Regardless, I like the explicitness of intent we would get with "std:".

Perhaps the scheme, whatever it was, could be optional for use in the case of resolving ambiguity?

Please no - optional things cause ambiguity and would present a refactoring hazard if you suddenly added another import that made it ambiguous. Whatever format is decided, it should be always required.

Do any known file systems assign meaning to a leading ":" in file/path/device names?

Over course, if we are worried about running into that, we could do "::" escaping like I suggested for "*".

The old Mac OS uses ":" as the path name separator. Names are absolute by default, a leading ":" makes them relative, multiple leading ":" walks up the file system tree, eg, ":foo" is in the cwd, "::foo" is in the parent, etc.

More here, under Classic Mac OS

The problem with ":test" is that parsed against "https://example.com/" you'd get "https://example.com/:test". This is not a huge problem since we outlawed identifiers that do not start with "/", "./", or "../", but it's still somewhat surprising. If you'd decide to go down that route I'd vote for just using "test" since it's identical from a processing perspective and looks much better.

Given that much of ECMAScript's syntax is already C-like, what about just adopting the C/C++ preprocessor syntax to identify built-in modules?

Strawman: C++ style scheme:

import <SIMD>;
import {float32x4} from <SIMD>;
import SIMD from <SIMD>;

...For reference, here's the syntax used in C++:

#include <vector>  // a standard library header
#include <experimental/filesystem>  // an experimental standard library header
#include "mylibrary.h"  // a user-defined header file

@msegado I disagree. In C/C++, header name tokens are special case to the preprocessor, it is be part of tokenization. I believe introducing the syntax to ECMAScript will make parser even more complicated.

@msegado: This was brought up somewhere deep in the nest of comments above, but one requirement we have is the ability for users to easily polyfill new built-in modules in the future.

Because of this requirement, we're left with the need to use the same syntactic space as userland modules -- so coming up with some conventional string pattern that fits neatly in with our loader-string specs is probably the best path forward.

@jeffmo My apologies, I should have read the comments in more detail! Yes, that makes perfect sense; it's probably not worth introducing new syntax if it needs to resolve to a string in the loader anyway, and complicating the loader with separate treatment for builtins doesn't seem worthwhile.

commented

Is this for ES only? Should Node.js (or browsers when they decide to stop polluting global) put their own built-ins under std?

@AlicanC Platforms should definitely not put their built-ins under std (we don't want it to become the new global object namespace). Hopefully though, the naming convention that we choose for ES built-ins would be usable for platforms. Perhaps:

import * as FS from "node:fs";
commented

If platforms are to figure out their own names, then more than "std" will be susceptible to collision with IANA listings in the future.

I think a whole concept of module namespaces should be introduced and the namespace splitter should be something that makes the ModuleSpecifier an invalid URI, not :. Then the spec can use the "std" namespace itself and probably reserve others for future use.

// Importing File
import MyComponent from './MyComponent.js';
import Q from 'https://mycdn.com/q.js';

// Importing Module from Global (Root?) Namespace 
import Q from 'q';

// Importing Module in a Sub-Namespace
import SIMD from 'std^SIMD';
// or
import SIMD from 'std::SIMD';

I say ^ or :: because they make the ModuleSpecifier an invalid URI (right?) and eliminate the risk of any collisions. So if a platform wants to have a namespace called "http", it can.

@AlicanC There's probably some benefit to using syntactically valid URLs (something that can be used with new URL(...) for instance).

I agree that there's a theoretical problem with IANA scheme collision, but I'm not sure that it will be a problem in practice. It's certainly not a problem for Node. For the browser, maybe someone involved with HTML standards would like to offer an opinion? @domenic ?

commented

@zenparsing @domenic I would really like to see the HTML spec define its own set of built-in modules and make the new features only available under those. (Just like the "new features only for https" thing.)


Even if we didn't have any collision concerns, I would still think that there should be a clear distinction between importing a path and a name. Is there really a good reason to make every ModuleSpecifier URL-parsable?


Also, would you like to standardize a way for specifying versions so we can actually make breaking changes and have opt-in modern APIs with specs that are not stuck in the early 90s?

import SIMD from 'std::simd';
import SIMD2 from 'std::simd@2';
import DOM6 from 'html::dom@6';

I agree that there's a theoretical problem with IANA scheme collision, but I'm not sure that it will be a problem in practice. It's certainly not a problem for Node. For the browser, maybe someone involved with HTML standards would like to offer an opinion? @domenic ?

There's no problem here. The schemes with actual behavior are well-defined by Fetch, and std is not one of them.

@zenparsing @domenic I would really like to see the HTML spec define its own set of built-in modules and make the new features only available under those. (Just like the "new features only for https" thing.)

There's very little motivation to do this. Globals have served the web platform well so far, and starting to make people go through extra hoops for new features doesn't really give us anything besides an inconsistent platform.

Also, would you like to standardize a way for specifying versions so we can actually make breaking changes and have opt-in modern APIs with specs that are not stuck in the early 90s?

This has been an antipattern on the web. Versioning specifiers like <!DOCTYPE html> or "use strict" cause engines to have to maintain two parallel separate mode implementations, which is a burden much worse than maintaining a compatible API. (That's why in other cases, e.g. <svg version="x">, the version specifier is completely ignored by the browser.)

This has been an antipattern on the web. Versioning specifiers like or "use strict" cause engines to have to maintain two parallel separate mode implementations, which is a burden much worse than maintaining a compatible API.

I agree with the general point you make here, but not as applies to "use strict". The non-antipattern that has emerged on the web can only cope with growing and compatible standards. This is why, in the simplicity dimension, standards can generally only get worse over time. ES3 was a mess -- it didn't even have lexical scoping. Functions were not really encapsulated. I could go on. If we had to build the future of JavaScript on sloppy ES3 we would not have gotten very far. "use strict" is an amazing and rare thing: a successful subtractive effort by a standards body that may not break its customer's code.

I also agree with the literal point you make. The mode switch was a burden for engines. But the pain was worth it to rescue JavaScript from the ES3 mess.

While I understand your position in general, I think you'll find a variety of opinions on whether it was worth it.

I am certain of that there are a variety of opinions about this!

With 1JS the value of strict mode was vastly diminished, perhaps even
negated. One of them was a mistake, YMMV which one -- the net result is a
high combinatorial complexity cost for fairly little benefit.

On 20 March 2016 at 00:48, Mark S. Miller notifications@github.com wrote:

I am certain of it!


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#395 (comment)

Note that 1js stops at module and class boundaries, whose bodies are always
necessarily strict. Both modules and classes are such attractive
abstraction mechanisms that they may eventually dominate new code.

But yes, as you know, I agree (as I think you now do) that the 1js approach
of introducing new features into both sloppy and strict was a mistake.
Sloppy mode should have been kept to its original purpose -- an ES3
compatibility mode. We had no good reason to impose on ourselves the
complexity burden of adapting the new features to somehow appear in sloppy
code. At least we stopped this insanity at module and class boundaries.

On Mon, Mar 21, 2016 at 12:02 PM, rossberg-chromium <
notifications@github.com> wrote:

With 1JS the value of strict mode was vastly diminished, perhaps even
negated. One of them was a mistake, YMMV which one -- the net result is a
high combinatorial complexity cost for fairly little benefit.

On 20 March 2016 at 00:48, Mark S. Miller notifications@github.com
wrote:

I am certain of it!


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#395 (comment)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#395 (comment)

Cheers,
--MarkM

domenic states that there's no problem using std scheme, so are we going to use std:name or still trying to use an invalid URL, such as std::name?

It looks like you're reinventing URNs. If that is the case, why not just use them?

@nbdd0121 I believe that foo::bar is a syntactically valid URL (where the path component is ":bar"). So I don't think the double colon helps anything.

what about

import {float32x4} from "https://www.ecma-international.org/simd";
commented

@graingert I don't think it is particularly practical for a standard library to be so verbose. This might remind many of the verbosity of older DTDs, who most people just learned to copy-and-paste. I'm sure most people wouldn't want to go back to something like that.

If foo::bar or foo:bar are valid URLs already, a more attractive choice could be to consider other sigils than :.

@kdex it's not so bad for w3 apis:

import fetch, {Response, Request} from "https://w3.org/fetch";

You can't just squat schemes like that.

I know that it may not work with legacy MacOS, but I'm still in favor of a ':' or '::' prefix (ex: :simd). I do think that submodules for globals should probably just use the forward slash (:foo/bar), so that part can be polyfilled as @ecmascript/foo/bar. Not sure if anyone from npm is in here, but reserving something like @ecmascript as an organization would be a good preemptive step.

There seems to generally be an assumption in this thread that built-in modules would have to be identified using a string literal module descriptor. But that's not really the case, TC39 gets to define new syntax. To avoid any clashes with host/platform use of string module specifiers, I suggest the following change to the ModuleSpecifier syntax:

ModuleSpecifier :
    StringLiteral
    BuiltinSpecifier

BuiltinSpecifier :
    IdentifierName
    BuiltinSpecifier . IdentifierName

With the static semantic restriction that std as the first IdentifierName of a BuiltinSpecifier is reserved for use in naming built-in modules defined by TC39.

So we might have:

import {doc, freeze} from std.decorators;
import streams from web.streams;
import procs from node.processes;
import gcControl from v8.gc;

It isn't clear if there is a requirement to use dynamic import for built-in modules. Because built-ins are built in and presumably already present as part of the implementation it isn't obvious that there would be any advantage to conditionally loading them. But if that functionality is needed, it can be accommodated without any syntactic ambiguity by:

ImportCall :
    import ( AssignmentExpression )
    import BuiltinSpecifier

Dynamic import definitely would be required if you are writing a module that can gracefully support multiple environments while using their builtins.

For example, a fetch implementation that uses XMLHttpRequest on browsers, and node.net on node.

Conditional static imports would be necessary for built-in modules, to satisfy polyfilling and multi-env use cases.

Personally, l would prefer to use an external config map for such situations.

But, as noted in my sketch ImportCall could be made to work with BuiltinSpecifier

ImportCall is asynchronous; that doesn’t satisfy the use cases i mentioned above. We need dynamic static imports to mak that tenable.

It feels like many comments in this thread didn't heed the advice from #395 (comment).

If we were to have the ability to map bare specifiers to URLs in browsers, then this space might be well suited to builtin modules, and NodeJS could do the same.

For example, import "@simd" (@ as strawman) could be a builtin name. For browsers or NodeJS that doesn't support the builtin, it could be mapped by a browser resolver / manifest and by a custom NodeJS resolver, to wherever its actual shim implementation is - there's no need for the URLs to align at this level.

Strings do seem important though for the shimming process.

@ljharb

We need dynamic static imports to make that tenable.

I'm not sure what you mean by "dynamic static". I can imagine a non-failing static import:

import ?gotFoo foo from std.foo ; 
// If import succeeded gotFoo is true and foo has the default export value
// If import failed gotFoo is false and foo is TDZ uninitialized.

But, why all the barriers to progress. We don't need to boil the ocean. TC39 has one specific need, a way to namespace manage new standard library features using modules. If that is all "built-in modules" supports that's fine. It would be nice if it also can be used for built-in host platform modules. Even if the web platform wouldn't use it, I'm sure node and other environments would.

We have plenty of tools and techniques for bundling or otherwise statically or dynamically configuring JS apps for deployment to various environments. We don't need to invent new configuration management mechanisms before taking the basic step forward that enables modularizing the ES standard library as it grows.

Having a half-baked mechanism for adding new standard features isn’t sufficient; if it doesn’t have all the polyfillability and secure-ability that globals do, then new features will have to continue to also be added as globals. In other words, TC39’s need can not be met without these features.

Not half-baked, just limited scope. Many of the problems that are being raised are platform issues and not TC39 issues. As I said on twitter, until TC39 actual defines some standard modules no platform will try to deal with those issues.

Just to throw it out there, specifically regarding the "from clause" and what I like to refer to as the standard/platform library equivalent to platform globals or the default namespace — which may be different from various perspectives taken in this thread, so let me refine it a little:

  • default namespace is equivalent to platform globals (ie no name collision)
  • default namespace is probably actually platform globals in legacy
  • default namespace works like platform globals in legacy

If there is only one such namespace (exporting from other namespaces as each platform sees fit) then the "from clause" could simply be dropped in static imports. The questions left unanswered are:

  1. how breaking will it be for tooling to adapt the new syntax?

    This point was very important when considering the "from clause", because strings and identifiers are miles apart in the amount of work a tool will need to do to analyze modules (especially critical for loaders).

  2. what could this look like?
    import { Object, fs, process };
    !fs // true
    !process // false
    if (process) process.exit(); // bye
  3. what scenarios would justify dynamic imports?

    no idea, but possible

  4. what would a dynamic import look like?
    import global;
    global.dynamicThing // => export default { async get dynamicThing() { … } }

Edit: A key goal here is to make it possible to transition polyfilling into a module space if they choose and not affect existing behaviour all while allowing new code to be structured away from globals. I don't make any claim regarding how to address the mutation of import <global namespace as IdentifierName> which I gather will be currently synonymous to the current global scope (then maybe evolve).

Update: Should the syntax need a little more verbosity it is possible to consider things like import default global or import default {…} but this is secondary (the idea of skipping resolution and not forcing platforms to diverge throughout is really the main point).

There seems to generally be an assumption in this thread that built-in modules would have to be identified using a string literal module descriptor. But that's not really the case, TC39 gets to define new syntax. To avoid any clashes with host/platform use of string module specifiers, I suggest the following change to the ModuleSpecifier syntax:

ModuleSpecifier :
    StringLiteral
    BuiltinSpecifier

BuiltinSpecifier :
    IdentifierName
    BuiltinSpecifier . IdentifierName

With the static semantic restriction that std as the first IdentifierName of a BuiltinSpecifier is reserved for use in naming built-in modules defined by TC39.

So we might have:

import {doc, freeze} from std.decorators;
import streams from web.streams;
import procs from node.processes;
import gcControl from v8.gc;

It isn't clear if there is a requirement to use dynamic import for built-in modules. Because built-ins are built in and presumably already present as part of the implementation it isn't obvious that there would be any advantage to conditionally loading them. But if that functionality is needed, it can be accommodated without any syntactic ambiguity by:

ImportCall :
    import ( AssignmentExpression )
    import BuiltinSpecifier

I do think this is the best one for naming URL which clear enough for human understanding

commented

There seems to generally be an assumption in this thread that built-in modules would have to be identified using a string literal module descriptor. But that's not really the case, TC39 gets to define new syntax. To avoid any clashes with host/platform use of string module specifiers, I suggest the following change to the ModuleSpecifier syntax:

ModuleSpecifier :
    StringLiteral
    BuiltinSpecifier

BuiltinSpecifier :
    IdentifierName
    BuiltinSpecifier . IdentifierName

With the static semantic restriction that std as the first IdentifierName of a BuiltinSpecifier is reserved for use in naming built-in modules defined by TC39.

So we might have:

import {doc, freeze} from std.decorators;
import streams from web.streams;
import procs from node.processes;
import gcControl from v8.gc;

It isn't clear if there is a requirement to use dynamic import for built-in modules. Because built-ins are built in and presumably already present as part of the implementation it isn't obvious that there would be any advantage to conditionally loading them. But if that functionality is needed, it can be accommodated without any syntactic ambiguity by:

ImportCall :
    import ( AssignmentExpression )
    import BuiltinSpecifier