miku / metha

Command line OAI-PMH harvester and client with built-in cache.

Home Page:https://lab.ub.uni-leipzig.de/metha/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed With Unprocessable Entity

tobiasschweizer opened this issue · comments

Hi @miku,

I experience a problem with an endpoint that used to work before. I am not sure if this is a problem with the endpoint itself. Maybe you can give me some guidance.

$ metha-sync -v
0.3.0
$ metha-sync -base-dir . -format marcxml -set user-lory_phlu https://zenodo.org/oai2d
INFO[0000] https://zenodo.org/oai2d?verb=Identify       
INFO[0000] harvest: &{BaseURL:https://zenodo.org/oai2d Format:marcxml Set:user-lory_phlu From: Until: Client:0xc0000847f0 MaxRequests:1048576 DisableSelectiveHarvesting:false CleanBeforeDecode:true IgnoreHTTPErrors:false MaxEmptyResponses:10 SuppressFormatParameter:false HourlyInterval:false DailyInterval:false ExtraHeaders:map[] KeepTemporaryFiles:false Delay:0 Identify:0xc00013aec0 Started:0001-01-01 00:00:00 +0000 UTC Mutex:{state:0 sema:0}} 
INFO[0000] https://zenodo.org/oai2d?from=2014-02-03T00:00:00Z&metadataPrefix=marcxml&set=user-lory_phlu&until=2014-02-28T23:59:59Z&verb=ListRecords 
FATA[0000] failed with Unprocessable Entity on https://zenodo.org/oai2d?from=2014-02-03T00:00:00Z&metadataPrefix=marcxml&set=user-lory_phlu&until=2014-02-28T23:59:59Z&verb=ListRecords: <nil> 

Among the returned info from the endpoint, I see "DailyInterval:false". Does this explain the issue?

UPDATE: I see "DailyInterval:false" also with other endpoints that work.

Thanks!

Tobias

using -no-intervals works

Interesting. It seems that your example query does not match any record, and the server chooses to return an HTTP/1.1 422 UNPROCESSABLE ENTITY for that. I added a workaround for the 422 error (although I haven't seen this failure that much in other endpoints, yet).

Yes, it is strange. It used to work some months back which is why I wrote an email to zenodo to find out whether the changed any settings. It is weird that the indicate an Earliest Datestamp 2014-02-03T14:41:33Z and then nothing is returned from that interval.

Thanks for the fix. I'll try it out and get back to you.

I can confirm that with 0.3.2 I am able to harvest without the -no-intervals flag. Thanks for fixing that so quickly!
In case a get answer from zenodo explaining the behaviour, I'll let you know.

I have just heard from Zenodo. They logged the issue and will report back with an update once the fixed it.

Thanks for the update. Somehow, HTTP 422 does not seem so wrong, after all:

The HyperText Transfer Protocol (HTTP) 422 Unprocessable Content response status code indicates that the server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions. -- https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/422

The OAI spec is not against using additional HTTP status codes: http://www.openarchives.org/OAI/openarchivesprotocol.html#StatusCodes

OAI-PMH repositories MAY employ HTTP Status-Codes in addition to "200 OK".

If I find other endpoints using HTTP 422, I'll make a note.

Still I don't understand why the earliest timestamp would not return any data.

The missing earliest timestamp issue I saw in other endpoints as well and that should be fixable.