blekhmanlab / rxivist

API providing access to papers and authors scraped from biorxiv.org

Home Page:https://rxivist.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Preprint missing from search results

agitter opened this issue · comments

After the award-winning GLBIO talk from @rabdill, I checked my https://rxivist.org author profile and found one preprint is missing. It doesn't appear in the author profile, nor does it appear when searching for the title.

The direct DOI link https://rxivist.org/papers/10.1101/337956 works but is missing authors and download stats.

Should be fixed now, sorry about that.

It looks like the issue pops up when a preprint gets associated with an invalid URL—in this case (and 41 others), the bioRxiv listings for some reason included a link to a new version of the paper that didn't actually exist, so the spider was trying to grab the authors for "v2" of the paper when there was only ever a "v1."

This has to be related to the transition to the full-text stuff; it looks like some of the papers legitimately don't have authors listed anymore:
https://www.biorxiv.org/content/10.1101/418806v1

New code in #245 uses DOI to resolve more accurate URLs

That's interesting. This paper did have a "v2" but now https://www.biorxiv.org/content/10.1101/337956v2 shows This paper is still processing; please check again shortly.

This issue is resolved, and bioRxiv also fixed the v2 version of the preprint.