nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨

Home Page:https://nodejs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarification Needed: require('http') and require('node:http') with existing require.cache entries

gitspeaks opened this issue · comments

Affected URL(s)

https://nodejs.org/docs/latest-v20.x/api/modules.html#core-modules

Description of the problem

According to the documentation:

Core modules can be identified using the node: prefix, in which case it bypasses the require cache. For instance, require('node:http') will always return the built in HTTP module, even if there is require.cache entry by that name"

And:

Some core modules are always preferentially loaded if their identifier is passed to require(). For instance, require('http') will always return the built-in HTTP module, even if there is a file by that name."

Given this, if require('http') always returns the built-in HTTP module, under what circumstances would a require.cache entry by that name exist, making it necessary to use require('node:http')?

@nodejs/loaders

If you populate require.cache.http manually, e.g. node -e 'require("./test/fixtures/empty.js");require.cache.http = require.cache[require.resolve("./test/fixtures/empty.js")];console.log(require("http") === require("./test/fixtures/empty"));'

@aduh95 why is that allowed for preferentially loaded modules ?

I don’t understand the question, what is allowed?

As your example demonstrates, manually modifying the cache entry for preferentially loaded core modules bypasses the intended preferential loading characteristics. This seems counterproductive to the purpose of preferential loading.

IIUC if you have a dependency named http, the internal module will be preferred, and overriding the cache will override everything.

I don’t know the reason, and it doesn’t really matter, that behavior cannot be changed without breaking the ecosystem. If you don’t like that behavior, you can add the node: prefix.
The only thing we can do is improving the docs, if you have some suggestion, please share and/or open a PR.

@RedYetiDev

and overriding the cache will override everything.

IIUC, the goal of preferential loading is to 'protect' core modules from being overwritten. Allowing manual manipulation of their cache entries seems to defeat this purpose.

@aduh95

that behavior cannot be changed without breaking the ecosystem.

Is manually manipulating cache entries for core modules a common scenario?

IIUC, the goal of preferential loading is to 'protect' core modules from being overwritten. Allowing manual manipulation of their cache entries seems to defeat this purpose.

Like I said, the “goal” does not really matter, it is the way it is, and changing it is simply not worth it at this point. As someone who wasn’t involved at all in the development of CJS, I can’t tell you what was the rationale behind this design choice – or maybe it was an oversight, again, I have no idea – but as a maintainer I can tell you that whatever it is, it won’t change the fact that we want maximum stability on that area of the code base, and any PR that would try to amend the behavior would likely get rejected.
CJS has not the ideal API surface, and we’re trying to not repeat that with ESM (that’s one of the reasons we don’t expose the ESM module cache, for example); for CJS there’s not much we can do but improving the docs.

Then, at the very least, I suggest amending the documentation to clarify the behavior, with something like:

"Some core modules are always preferentially loaded if their identifier is passed to require(). For instance, require('http') will return the built-in HTTP module, even if there is a file by that name. However, note that core modules imported without the node: prefix are served from the cache if an entry for the module exists. Since cache entries of core modules can be manually manipulated, this allows replacing the implementation for the dependent module. If you want to ensure obtaining the original implementation, use the node: prefix."

I’m not sure “the original implementation” is the correct phrasing, as it’s still possible to overwrite built-in modules as well (e.g. you could have require("node:http").createServer = function myCustomImplentation(){} somewhere in your code). In any case, clarifying the docs SGTM, would you like to send a PR?

I’m not sure “the original implementation” is the correct phrasing.

read: "If you want to ensure obtaining the original implementation"

Alternatively: "If you want to ensure require returns the original implementation"

I'm fine with either. What do you prefer?

In any case, clarifying the docs SGTM, would you like to send a PR?

Sure. I can give it a try :)

However, since core modules are also "imported" I think "obtaining" is better.

If you use import, there are no differences between using the node: prefix or not, the difference of behavior only happens with require.
I think it would be clearer to phrase it the other way around, if you will, the docs are already used the node: prefix everywhere, so I think it would make more sense to phrase it “if you are using require() to load a built-in module without the node: prefix, you might get a different module if the require cache was tampered in user-land”

If you use import, there are no differences between using the node: prefix or not, the difference of behavior only happens with require.

Aren't core modules considered CommonJS modules? Isn't the CommonJS cache still accessible from an ES module?

No they considered… built-in modules. You can certainly access the require cache from ESM, but that has no effect on the matter.

IMO this should be converted to a discussion, am I okay to do that?

I don't understand. Are you saying that the require cache can't be manipulated in an ES module in the same way shown in the code you posted?

IMO this should be converted to a discussion, am I okay to do that?

@RedYetiDev I have no objection.