sckott / habanero

client for Crossref search API

Home Page:https://habanero.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get DOI from query? Convert to dataframe?

robtlx opened this issue · comments

Hello!

How would I go about in extracting the DOI from a query result?
I tried a variant from here but I get a KeyError on 'DOI' in [ z['DOI'] for z in x['message']['items'] ] and I don't really know how to proceed.

I tried converting the query results to a dataframe but that gives me most of the results under one single parameter instead of splitting them more tidily.

I'm still a beginner in Python so please keep in mind some terms might be confusing.

My endgame is to get a column of DOIs which I can then compare to another column I've already generated - seeing what relevant journals I haven't collected already.

Thank you!

Managed to solve this by rewriting things around like:
for i in crossref_results['message']['items']:
doi = i['DOI']

But now I'm running into a different issue.. if I go along with the max results of 1000, everything is fine - but obviosuly I want more than 1000. If I do cursor=(*), it runs for quite a while but then I get a "TypeError: list indices must be integers or slices, not str" for the first line (for i in crossref_results).

I tried printing the iterated element ("i" or "doi" in my case) but it doesn't - just hits me with this error.

Is anything possible?

thanks for your question

What Python version are you using? And what habanero version? You can get the habanero version like

import habanero
habanero.__version__

So if you run the below example, you get a key error?

from habanero import Crossref
cr = Crossref()
x = cr.works(filter = {'has_full_text': True})
[z['DOI'] for z in x['message']['items']]

If the above works for you, please share the full example so I can see why you are getting the error.

Yes, using cursor pagination will take a while if you are not filtering the query in any way since there are a lot of records to page through.

The docs you linked to has an example of how to work through the results from using cursor, see the example under the heading "# Deep paging, using the cursor parameter"

Thank you for the reply and sorry to bother!

I'm on Habanero 1.2.2 and tried it on two different machines running Python 3.6 and 3.8.

I managed to work around the first version but tried running the code snippet you asked and it's now not giving any errors - just flagging the second statement as having no effect (I'm using PyCharm CE). It's not returning anything, either. Also, I am not interested in browsing through all full text publications - but more in searching for DOIs I already have. But I somehow worked around that by figuring out a different approach to the checking and am now looking at a more general level - specifically ISSNs, and I managed to succeed through looping through my ISSNs and querying cr.journals(ids='').

Thank you again for the help!

Great, nice work figuring it out