algolia / npm-search

🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data discrepancy between algolia & replicate.npmjs.com/registry

wardpeet opened this issue · comments

We're using this package on https://www.gatsbyjs.org/plugins/ to show information from NPM. It seems like the data is not 100% correct. When we manually crawl the registry endpoint we get the correct data but through algolia we don't.

On this page https://www.gatsbyjs.org/packages/gatsby-source-sanity/?=sanity we display the readme string. If we crawl the registry https://replicate.npmjs.com/registry/gatsby-source-sanity we get a valid readme string. Whenever we use algolia the readme field is an empty string.

This is the Gatsby bug for more info: gatsbyjs/gatsby#11129

I think this is because the update has been done in the last 24h, the index gets completely updated weekly; and during that time it goes alphabetically, but it will get behind on the updates done during that period. It's currently at 97% replication (±1.5h to go); all updates that have been missed due to the replication will be caught up after that.

Thanks for the super fast answer. Is there a status screen or something we are able to check this progress?

Currently not public, but it should be made

Sweet, seems like we get correct data now. We'll need to recreate the site.