algolia / npm-search

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

> 5 days delay in indexing

MartinKolarik opened this issue · comments

We got a report today about a package missing from the search results (jsdelivr/jsdelivr#18387), and I see the first release was five days ago. There was also a similar issue recently (#928). Is there anything we could do to reduce the delay?

yep indeed.
It seems NPM bumped their sequence again and we are 60K updates behind.
Not so much we can do since we need to process update sequentially.
The same issue all over again :'(

We had lost metrics monitoring since a few weeks (and had no time to look up), I have added a more annoying alerting so that I don't skip those.


For JsDelivr we should really sync and find not an alternative but a decent way to fallback. Could be with a secondary real time index that has no additional info combined with an internal API to refresh cache.

Do you know what's slowing down the indexing most? I'm wondering how the fallback would work because if you visit a page like https://www.jsdelivr.com/package/npm/quantdom it's easy, but to get there, you'd usually search at https://www.jsdelivr.com/?query=quantdom and if that returns some results, there's no way for us to know we need to fallback.

At the same time, we use jsDelivrHits for sorting the search results and also plan to add an option for filtering based on moduleTypes/styleTypes fields, so using a lighter index always is not an option. Would it make sense to maybe insert packages to the main index immediately, with some kind of "in progress" flag, and update it later? That might work well also with your previous "on-demand" endpoint idea.

I notice many versions being missing, some much older than 5 days - for example:

https://yarnpkg.com/package/svelte - last version shown is 3.44.0, October 17, 2021
https://www.npmjs.com/package/svelte - 3.45.0 was published January 6, 2022

yes, there was an outage this weekend. We have suffered a lot of 429 from Cloudflare skipping a lot of packages.
I had to rollback to an older version of the index, and it's currently catching up with NPM update queue.

The index is back up to date now, thanks for standing by!