badges / shields

Concise, consistent, and legible badges in SVG and raster format

Home Page:https://shields.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NPM Downloads divergence

tiagoporto opened this issue Β· comments

Are you experiencing an issue with...

shields.io

🐞 Description

I see a divergence from the badge generated by shields.io (NPM Download) and https://npm-stat.com/.

The package I was testing is limit-lines.

shields.io returns 916 downloads.

Screenshot 2024-03-12 at 11 54 07 PM

https://npm-stat.com/charts.html?package=limit-lines&from=2018-01-01&to=2024-03-12 returns over 4600 downloads
Screenshot 2024-03-12 at 11 56 03 PM

πŸ”— Link to the badge

https://img.shields.io/npm/dt/limit-lines

πŸ’‘ Possible Solution

Maybe uses npm stats.

Badge tested using npm run badge https://img.shields.io/npm/dt/limit-lines
Output is available here

This is something that has come up before
#8296

When we calculate the downloads, we call https://api.npmjs.org/downloads/range/1000-01-01:3000-01-01/limit-lines
and then add up the number of downloads
so basically we're doing

curl "https://api.npmjs.org/downloads/range/1000-01-01:3000-01-01/limit-lines" | jq '[.downloads[].downloads] | add'

We are aware that there is some cutoff date on this though. NPM doesn't actually give us all the downloads from the beginning of time.

I'm not sure how npm-stat is working this out. I'm happy to have a look and see if we're missing a trick here.

Reading through this my brain had the exact same vague recollection I expressed in #8296 (comment) πŸ˜†

Fwiw for now I'd be hesitant to make any changes to increase the load we're sending at the npmjs apis though given the current volume and their announced (but similarly vague) plans to introduce rate limiting

The easiest solution could be to migrate to NPM Stats API

There are some tradeoffs here:

NPM downloads is one of our highest traffic badges. We make about 30k requests per hour to https://api.npmjs.org/downloads/ although I'm not sure what the split is between weekly/monthly/yearly/total. This is a problem that only affects total.
I don't have an issue throwing that kind of traffic at NPM. It is a drop in the ocean for them.

We did fairly recently add a badge that calls npm-stat, and we've learned several interesting things since doing that:

  • npm-stat is maintained by one person and runs on a single server
  • last month npm-stat had ~10 days of downtime pvorb/npm-stat.com#135

As such, I'd be reluctant to just move all that traffic to npm-stat, especially without warning! Obviously there is also a tradeoff between completeness and reliability here.

@pvorb - how are you doing this on npm-stat? Based on the snippets posted, it looks like you're calling the same NPM API endpoint we are. Are we missing something when it comes to querying this from NPM?

The reason is likely that npm once returned numbers for some dates in the range, but today it only returns 0. But you would have to compare the results day by day in order to find out.

npm-stat.com on the other hand stores download counts in a local database (essentially a persistent cache) and only requests numbers from npm when a day is missing in the db.

npm-stat is maintained by one person and runs on a single server

I'd say it's even worse than that. I occasionally glance over my emails and when I notice someone complaining, I try to look into the problem within a few days or so. But I consider family and work more important than this little side project of mine.

So unfortunately I can't make this easier for you. Personally, I wouldn't use something like npm-stat.com for a "professional" service. The service is fast, because everything is on a single VM, so latencies between the database and the backend service are as minimal as they can get. People are not used to such a setup anymore. But before the outage a few weeks ago, I didn't look into the service at all for months.

Thanks. Shields is also not a "professional" service - it is also maintained by volunteers, but we do aim to both

  • provide a good service for our own users and also
  • not throw large amounts of unwanted traffic at small projects who don't really want it

Based on this discussion, I think sticking with our current approach and accepting this limitation is the right solution here.

The other thing we could consider doing is say: "We can only reliably show weekly/monthly/yearly stats from NPM based on the data available from the registry, so lets remove the 'total downloads' badge" but I think enough people get enough value from it in its current form that we'll probably annoy more people than we help by doing that.

Hey @chris48s,
As this question could arise in the future, again and again (#8296). I think this topic is so important that it shouldn't be closed without any changes.

If no fix, 3 possible solutions:

  • Remove the total downloads badge(As you mentioned).
  • Rename the total badge to something like, 3 years total(is the NPM result API).
  • Keep the Total download badge but explicitly inform that is not the total download. (It doesn't make sense to me to have a broken feature)

WDYT?

Here's another idea:

We change the route to /d18m (downloads in the last 18 months - this is what we actually get https://github.com/npm/registry/blob/master/docs/download-counts.md#limits) so https://shields.io/badges/npm-downloads shows the documented intervals as /dw, /dm, /dy and /d18m. It is a bit of a funny interval, we don't use it anywhere else, but it is accurate.

We make /dt a redirect to /d18m. We sometimes use redirects instead of a full deprecation when there is a comparable endpoint.

That would make the docs accurate for new users while transparently migrating any existing users of /dt.

Pretty good, this solution is aligned with the implementation.