sckott / habanero

client for Crossref search API

Home Page:https://habanero.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementation of the SELECT parameter

sngordon opened this issue · comments

I've found that when I request results an alternative format (ris, bibtex) the date returned tends to be the published-online date (perhaps the earliest of the published-online & published-print). The authors I'm working with tend to cite the published-print date, so I'm looking for strategies to provide the print date instead.

Examples:
http://api.crossref.org/works/10.1007/s10980-007-9188-1
ris date = published-online = 2007/12/22 (published-print = 2008/2)

http://api.crossref.org/works/10.1139/cjfr-2014-0148
ris date = published-print = 2015/2 (no published-online data)

One option would be to make an additional request to the API specifically for the published-print date using the SELECT parameter, and use it to replace the date in the RIS-formatted record. I don't think Habanero supports the SELECT parameter? I could perhaps modify the counts.py module to pick out the published-print info, but I see it uses a different url requiring a login, so I'm wondering if it might be better to try to implement the SELECT parameter in the cn.py code instead?

Another option would be to modify the crossref API's content negotiation / parsing code, but I don't see this available anywhere (I've only found this general reference: https://citation.crosscite.org/docs.html) (I realize this is more an issue for the crossref API forum).

Thanks for the issue @sngordon

Hadn't seen the select parameter, will add it to habanero, (note to self, see also CrossRef/rest-api-doc#289)

As for your question about print vs. online dates, it doesn't appear that you can select those published date fields with select on that route /works/{DOI}. Not sure what best approach is. Not sure why you'd make a second request using select after the first request, which should have all the fields in it?

AFAIK there's no login required for the content negotation methods in habanero. The default URL is https://doi.org

@sngordon you can reinstall from github and try again, select parameter is implemented

closing this now as select param implemented

@sckott thanks so much for the quick implementation of this! I found that I can now combine select with filter to get specific fields from a specific doi:
cr.works(filter = {'doi':"10.1007/s10980-007-9188-1"}, select = "DOI,published-print")

The reason I'm making this additional query for published-print is that our current workflow only requests a RIS-formatted record, which provides just a subset of the full crossref record. Might have to rethink this though. As an ecologist you might be tangentially interested in our bibliographic application: https://nwfp.taccimo.info and https://taccimo.info

The login I was referring to is in the counts.py module, which appears to use a different default url http://www.crossref.org/openurl/

Great, glad that worked for you with select and filter

I see. Yes, counts has a email address for my collaborator. The idea is that module will no longer be needed some day as data are supposed to make it into the main crossref API, but who knows when that will be

Very cool about the tool you make. Is that using Crossref API in the backend then?

The taccimo site is a MySQL database with a php front end. The references are actually hand selected or read in from key documents in an unstructured format. Then we use Crossref to attempt to retrieve a doi and a structured record (using RIS format now but may switch to citeproc-json so I can make sure to get the published-print date). Next I'm working on using the structured record to return a formatted citation in any style from the CSL. Unfortunately the style I created for this current project doesn't seem to be working, but I doubt this has anything to do with habanero:

from habanero import cn
cn.content_negotiation(ids = "10.1007/s10980-007-9188-1", format = "text", style = "usda-forest-service-pacific-northwest-research-station")`

here's the docs page for content negotation in case you hadn't seen it https://citation.crosscite.org/docs.html

yeah, i don't know why that's not working. will see if i can find out

so the CSL style just isn't updated where they are pulled from when doing content negotiation apparently, see the error in

curl -LH "Accept: text/x-bibliography; style=usda-forest-service-pacific-northwest-research-station" https://doi.org/10.1007/s10980-007-9188-1 | jq .

but folks at https://citation.crosscite.org/ did just update CSL styles so you can get a format there, or by curl like

curl -v 'https://citation.crosscite.org/format?doi=10.1007%2Fs10980-007-9188-1&style=usda-forest-service-pacific-northwest-research-station&lang=en-US'

but that of course is not in habanero

Thanks again! I wasn't sure of where in the chain the problem was located. I didn't know about https://citation.crosscite.org/ either, but I'm sure I can rig some code to use that.