nodejs / corepack

Zero-runtime-dependency package acting as bridge between Node projects and their package managers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide reproducible build by default

mcollina opened this issue · comments

Corepack does not allow for reproducible builds by default. Consider a scenario where there is corepack enabled by default, i.e., the jump files are in place.

I run:

  1. npm init -y
  2. yarn

The result does not include a packageManager field in my package.json.

This makes the build non-reproducible for newbies as they do not know that they should run corepack use yarn@4.

This is mostly a problem in Dockerfiles, because most newbies completely ignore the corepack use step.

I'm in favor of this change 👍

However, I think it was originally suggested by node core to have package manager global autoupdate so that node could avoid handling CVEs for package managers.

Maybe @aduh95 or @arcanis or @mhdawson remembers the context.

To be fair, running corepack use yarn@4 would not create reproducible builds either, because it would pick whichever is the latest version on the 4.x line from Yarn registry.

To be fair, running corepack use yarn@4 would not create reproducible builds either, because it would pick whichever is the latest version on the 4.x line from Yarn registry.

@aduh95 Thats true if you run the command in the Dockerfile but I think what @mcollina is pointing out is that the "newbie" should run that command locally and thus the package.json would be updated and make a reproducible build once COPY package.json . line is reached in the Dockerfile.

However, I think it was originally suggested by node core to have package manager global autoupdate so that node could avoid handling CVEs for package managers.

How does this relates? The problem with node is to have in-tree dependencies (that causes cve chores) as well as constant updates of the latest version of said package manager.

What I’m saying is that corepack should write down the packageManager field on first usage of a package manager loaded via corepack. This allows for future reproducible builds. (And possibly more chores for dependabot to keep this up to date).

How does this relates?

I thought you were referring to this feature, #364, since it would prevent "reproducibly"

What I’m saying is that corepack should write down the packageManager field on first usage of a package manager loaded via corepack.

What if there is no package.json in the directory?
For example, my first usage might be in /tmp when I run pnpm --version?

However, I think it was originally suggested by node core to have package manager global autoupdate so that node could avoid handling CVEs for package managers.

The main concern is that Node.js should never, never, ever need an update for the end user to address a security vulnerability in one of the package managers installed by corepack. So having it install a locked version would not be acceptable, but having it record what it installed and re-using that would be ok assuming there was a way and it's well documented how the end user can force the version updatred to a newer version without having to update their Node.js version.

Any chance you’d have an example of such Dockerfile that would be broken? To be sure we’re on the same page, as I’m not sure I understand what is the issue.

I'm a little unsure whether this is Corepack's concern or individual package managers ... I don't know how I feel with potentially mutative operations on the package.json when the user didn't ask for them. For instance, with this feature, just running yarn --version would add a "packageManager": "yarn@x.y.z" when the user didn't ask for it. By contrast, package managers would know what is an "install command" (which are typically expected to mutate the project files) and could add the packageManager field themselves if missing.

Additionally, there's a risk of breakage for people using multiple package managers together (for instance someone running yarn install then npm run), as Corepack would add the packageManager check and would then block the subsequent npm run calls. That's the reason why we went with Corepack being all-tolerant on projects which don't include a packageManager field - it feels the safest course of action to guarantee minimal breakage on the initial release.

Overall I'm not opposed to this, but I feel like it shouldn't be a blocker for the initial release. It comes with risks, and I'd feel safer to have those risks to be spread over a separate major. Not everything needs to be lumped together - once we have the tool in place we're free to make other changes as needed, following traditional Node.js minor/major considerations.

@aduh95 the example in the pnpm guide does not guarantee reproducible builds: https://pnpm.io/next/docker#example-1-build-a-bundle-in-a-docker-container.

When Corepack will be enabled by default, the default usage of Node.js will create non-reproducible builds. I think this is critical to fix before enabling by default.

FROM node:20-slim AS base
ENV PNPM_HOME="/pnpm"
ENV PATH="$PNPM_HOME:$PATH"
RUN corepack enable
COPY . /app
WORKDIR /app

FROM base AS prod-deps
RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --prod --frozen-lockfile

FROM base AS build
RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --frozen-lockfile
RUN pnpm run build

FROM base
COPY --from=prod-deps /app/node_modules /app/node_modules
COPY --from=build /app/dist /app/dist
EXPOSE 8000
CMD [ "pnpm", "start" ]

Overall I'm not opposed to this, but I feel like it shouldn't be a blocker for the initial release.

Agreed. Corepack is part of Node.js already. Marking it stable/enabling it by default is another matter, we are not talking about an initial release.

It comes with risks, and I'd feel safer to have those risks to be spread over a separate major. Not everything needs to be lumped together - once we have the tool in place we're free to make other changes as needed, following traditional Node.js minor/major considerations.

My 2 cents is that the risk of not reproducible builds by default is higher. If Corepack needs more time to figure it out, let's wait until it's ready.

@mcollina the issue would manifest itself if pnpm were to release a new version between two execution of the Dockerfile, correct? Also, is my understanding correct that you are not suggesting that Corepack inside the Dockerfile to mutate the package.json, you’d like the packageManager field to be present before the container is started, right?

the issue would manifest itself if pnpm were to release a new version between two execution of the Dockerfile, correct?

Yes. Specifically, if the packageManager field is not set, it can even use different majors.

you’d like the packageManager field to be present before the container is started, right?

No. The packageManager field must be present before the image is built (but I guess this is what you meant).

By contrast, package managers would know what is an "install command" (which are typically expected to mutate the project files) and could add the packageManager field themselves if missing.

@arcanis a solution could be to have it as a requirement for a package manager to be handled by corepack. I'm not tied to a specific solution. I'm mostly concerned that a new major version of a package manager is not slipped into a docker image without me knowing, breaking my build.

I like the idea of Corepack tracking <pm> i or <pm> install command calls and then writing the packageManager field (if package.json exists) before invoking the package manager.

This would be very similar to how a lockfile is written the first time you install and then subsequent installs read the lockfile.

I do think we need to consider if there are cases where someone doesn't want this to happen. Probably the same people who don't like lockfiles will be the same people who don't want packageManger pinned in their package.json file.

I share this pain very much. Not for newbies but also for any automation that it is critical for it to use the right command.

Writing packageManager field on first use and making it essential really makes sense 💯 💯 💯


(off topic but only mentioning as I saw this tweet about stabilize requirements)

Probably Other than auto, is there any hope that corepack <install|run|x> become a thing to automatically run the configured command. We made nypm and ni to overcome but if corepack could come to aid all users that would be a miracle for whole ecosystem.

Corepack does not allow for reproducible builds by default

Isn't that what you requested in #93 @mcollina?

Corepack does not allow for reproducible builds by default

Isn't that what you requested in #93 @mcollina?

The two requirements are orthogonal.

I'm still new to corepack but I would expect the following to happen if packageManager is not present in the package.json

  1. It will use the latest major version of the package manager
  2. Pins the major version in the packageManager property within the package.json
  3. Pins the exact version in the packageManager property within the lockfile

On the next install it should then use the version as defined in the lockfile. And tools like Dependabot shall update the package manager version in the lockfile according to the allowed version range in the package.json.

3. Pins the exact version in the packageManager property within the lockfile

Corepack is writing data inside lockfiles? How does that even work with the various different lockfile formats out there?

Before we have Corepack adding packageManager fields we might want to wait and see how the new field design plays out, per #402 (comment) and openjs-foundation/package-metadata-interoperability-collab-space#15.

Yes, probably the other package managers won't like it if somebody else is modifying the lockfile content or is adding any kind of additional metadata. If you delete the package manager lockfile, you might also not expect that you just deleted the pinned package manager version too.

Just wanted to say that, as a naive user, I'd expect that any version definition within packageManager behaves like dependencies in the package.json. And that if I use a semver version range, the package manager version will be pinned for me (if not pinned already) and that subsequent installs will use that version too. Where to store this lock information I can't tell, but maybe a dedicated lockfile might be an idea.

There's no need for a lockfile here, the packageManager field intentionally only supports exact versions, not ranges (you can however run corepack use yarn@^2 if you wish, and it'll lock the field ... in packageManager itself).