wooorm / npm-esm-vs-cjs

Data on the share of ESM vs CJS on the public npm registry

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Additional metrics

GeoffreyBooth opened this issue · comments

First, thank you for doing this! I’ve been thinking about the problem you’re aiming to solve here, which is to measure “how the shift to ESM is progressing.” One way is certainly the way you’ve already implemented, to measure the percentage of popular published packages that are ESM. Another would be to measure the percentage of publishes that are NPM. As in, of all the times that popular library authors ran npm publish in the last week/month/few months/year, what percentage of those publishes were ESM code?

I don’t think this is necessarily a better metric that the one you’re already measuring, but I think it would be illuminating in addition to the straight registry survey. It would tell us what library authors think of the migration, in that we would be able to see how many of the popular authors have made the transition to ESM versus how many haven’t yet (or are perhaps doing both, writing CommonJS for old projects and ESM for new). The percentage of recent ESM publishes would by definition be higher than the percentage of published ESM packages, and the difference between the two numbers would give us a sense of how much the transition is accelerating over time.

You could also measure it in a few different ways. For example, always starting with the list of high-impact packages:

  • Filter out any that haven’t had a new release published within the last month (or three months or six months, whatever interval you think is best) and measure the percentages of formats for the remaining packages. This would tell us how far along the transition is among popular and active packages.

  • Fetch the list of releases for each package within the last month or three/six/etc months. For each release, measure if that release was ESM, faux ESM or CommonJS. This would give us a measure of how far along the transition is for published code, as in how much code getting published nowadays is ESM.

  • Fetch the list of releases for each package within the last year or two or three. For each release, measure if that release was ESM, faux ESM or CommonJS. Look for changes in format from one release to another. How many packages have switched from CommonJS to ESM, for example, in the last year? Were there more such switches in the last 12 months than the previous 12 months? This would tell us how much the transition is accelerating due to refactoring. How many popular packages have always been ESM? This would tell us how much the transition is accelerating due to displacement, where newer ESM-only packages are crowding out older non-ESM packages, taking their spots in the rankings.

This is a little bit like measuring voters in an electorate, where there are some swing voters who switch from voting for one party to voting for another, versus some voters leaving the electorate like via moving away or dying and other voters entering the electorate like via moving in or turning 18. Looking at publishes helps us discover trends like those, that otherwise we might not notice when the topline result doesn’t change; like if the governor of a state got reelected, but if the governor’s vote percentage went from 65% in the first election to 51% in the second, that tells us that the next election is much more likely to go to the other party than if the margin went from 51% to 65%. Along those lines, one more thing to measure is rankings:

  • Within the list of high-impact packages, what percentage of “getting more popular” libraries (packages whose rank went up, like from the 100th most popular to the 80th most popular) were ESM? In this month as compared to last month or three/six/twelve/etc months ago. This would tell us how quickly ESM packages are displacing other packages.
commented

Hi Geoffrey! I think it’s an interesting additional approach. I believe we get dates back from the npm API: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/ef72a297377e4e4ad469d1ffb3ca61e2bc509d63/types/pacote/index.d.ts#L153-L165. And we get all package.jsons back, including all historical ones.

Your last idea is likely easier to implement, because we have historical rankings in data/.

Feel free to work on this — I myself am currently not interested.

There’s a super long tail of older packages being used in the ecosystem though, lots of folk never update; I don’t see that as an “it doesn’t work”