crosscite / content-negotiation

DOI content negotiation

Home Page:https://data.crosscite.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Content Negotiation for certain DOIs returning "The resource you are looking for doesn't exist."

dhimmel opened this issue · comments

We noticed in our CI logs that DOI Content Negotiation is failing for certain DOIs due to The resource you are looking for doesn't exist..

Here are two DOIs currently experiencing the issue:

https://data.crosscite.org/10.7287/peerj.preprints.3100
https://data.crosscite.org/10.1111/geb.13346

We've had similar issues in the past at datacite/datacite#187. Any idea what is causing this?

CC @mfenner. Looking for anyone at crosscite that can look into the growing number of failing DOIs and any comments on whether https://data.crosscite.org is actively maintained?

Hi @dhimmel, we have been experiencing some issues with the stability of the Content Negotiation service recently due to abnormally high load. This is under active investigation and we hope to have remediation in place as soon as possible. There is an open StatusPage incident here: https://status.datacite.org/incidents/k7rjjq59kpyb

We have identified the underlying issue and the StatusPage has been updated, including information on the 404 errors. Requests for content-negotiation of DataCite DOIs should now be functional again.

Hi @digitaldogsbody, thanks for the info! Cool to learn about the status page and incident page where we can stay up-to-date. As I was writing this comment, I see the new update with "Identified" status:

A large number of requests has been identified that are specifically attempting to resolve Crossref DOIs via the DataCite content negotiation service, we are having on-going conversations about this use-case.

To remedy we've implemented some timeout logic and additional rate limiting, this however may have a knock on effect those attempting Content Negotiation for Crossref DOIs via DataCite Content Negotiation (Crossref Content Negotiation or via doi.org is unaffected) may receive a 404.

Requests for DataCite DOIs via Content Negotiation should now start to be resolving.

Just wanted to note that for Manubot, we try DataCite content negotiation first before doi.org content negotiation for all DOIs. This is for a few reasons:

  1. given a DOI, we don't know which Registration Agencies created it
  2. the DataCite content negotiation endpoint returns valid CSL JSON. This is not true for the doi.org content negotiation, which I think comes from Crossref and has bad CSL JSON CrossRef/rest-api-doc#222.
  3. The DataCite content negotiation does a better job with CSL fields, example at manubot/manubot#158 (comment).

Will DataCite Content Negotiation of Crossref-DOIs still work, but just with stricter rate limits? Are the rate limits something you can disclose so we can apply them on our end?

One final somewhat unrelated question: what is the difference between DataCite Content Negotiation, doi.org Content Negotiation, and Crosscite?

Noting that the two requests in the initial comment still don't work, but they are both Crossref registered, as per the "Which RA" service https://doi.org/doiRA/10.7287/peerj.preprints.3100:

[
  {
    "DOI": "10.7287/peerj.preprints.3100",
    "RA": "Crossref"
  }
]

The issue of resolution of Crossref DOIs with the DataCite content-negotation service is currently being discussed. At the moment, a 404 will be returned for the majority of them.

The rate limiting mentioned in the StatusPage update is for requests to the DataCite CN service in general, regardless of the RA, and is currently 100 requests per 5 minutes (although this may be subject to change as we continue to monitor the load on the service).

DataCite CN and Crosscite CN are (currently) the same thing. doi.org CN should redirect you to the CN service of the RA, meaning that the supported content types will depend on which RA the DOI is registered with. A guide list of the supported content types can be found here: https://citation.crosscite.org/docs.html#sec-4

doi.org CN should redirect you to the CN service of the RA

Ah this was a critical insight I was missing. Here's how DataCite and Crossref DOIs get redirected:

$ wget --header="Accept: application/vnd.citationstyles.csl+json" https://doi.org/10.6084/m9.figshare.5346577.v1 2>&1 | grep Location:
Location: https://data.crosscite.org/10.6084%2Fm9.figshare.5346577.v1 [following]

$ wget --header="Accept: application/vnd.citationstyles.csl+json" https://doi.org/10.1111/geb.13346 2>&1 | grep Location:
Location: https://api.crossref.org/v1/works/10.1111%2Fgeb.13346/transform [following]

The issue of resolution of Crossref DOIs with the DataCite content-negotation service is currently being discussed.

Good to know. Our experience is that the DataCite CN does a much better job returning CSL JSON for Crossref RA DOIs than Crossref CN. Do you think there is DataCite/CrossCite infrastructure that Crossref should incorporate into their CN service?

Is the infrastructure easy to run locally? We don't necessarily need to call the DataCite CN API if we can run this ourselves.

A local copy can be spun up very simply using Docker: docker run -p 8085:80 crosscite/content-negotiation

This will pull the latest version of the service from Docker Hub and expose it locally on port 8085, and can then be used identically to the data.crosscite.org service.

Please note that by default, the service points at the DataCite Staging API. In order to make it work with live data, you should supply an environment variable to the container. The easiest way to do this is with a commandline flag: docker run -p 9085:80 -e API_URL="https://api.datacite.org" crosscite/content-negotiation

Thanks @digitaldogsbody for info on setting up a local CN service.

Noting the following update from the incident:

Requests for DataCite DOIs should be unaffected going forward, requests for Crossref DOIs against DataCite Content negotiation directly may occasionally return 404s when it is unable to process.
It is preferred if possible to use Content Negotiation via doi.org and you will be redirected as appropriate to the registration agencies content negotiation service as appropriate.

I noticed the two URLs in the initial comment still return "The resource you are looking for doesn't exist." Can you elaborate on "may occasionally return 404s when it is unable to process"? Will Crossref DOIs selectively fail when the load on your systems is high? Or are the DOIs in the initial comment always going to 404?

I dug up the initial issue which led us to start using DataCite CN for Crossref DOIs: #92 (comment). I will continue to urge Crossref to improve their CSL JSON, but in the meantime the DataCite CN is the only public CN endpoint we're aware of that can produce valid CSL JSON. So +1 to continued support of Crossref DOIs if system resources allow.

Will Crossref DOIs selectively fail when the load on your systems is high? Or are the DOIs in the initial comment always going to 404?

I'm afraid at this point, I can't say. There is an issue with rate-limiting at the Crossref end, and so when when the service is unable to retrieve a response from the Crossref API, the outcome is that it will (currently) return a 404. When this happens is outside of our control, unfortunately.

To update this, due to the problems we faced with allowing consistent formatting while allowing content negotiation for non DataCite DOI's in this service, it's been disabled for the time being.

This feature was never properly documented as supported it was kind of just an added bonus.
The correct way should be to go via doi.org and be redirected to the appropriate content negotiation service for the appropriate registration agency.

I do understand that the CSL JSON can be different depending on libraries used, on our side we use standard Ruby plugins for the CSL metadata conversion and our own mapping from metadata via the bologonese library.

This may of course have knock on effects if you were relying on us also doing CN for Crossref DOIs, but I think it's better long term handled by Crossref. However saying that we could potentially look at finding a way to offer this feature (in collaboration most likely with Crossref) but it would need further thought. If this is an essential feature you believe DataCite should look at offering, I'd raise it through to our product team via https://datacite.org/roadmap.html