ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Query based on PMIDS

alexmaina opened this issue · comments

I would like to bring your attention about the inability of getpapers to mine content using PMID.
For example, when i run a query using pmid

alex@alex-HP-ProDesk-600-G2-SFF:~$ getpapers -q PMID:27355041 -n -o maina
info: Searching using eupmc API
info: Running in no-execute mode, so nothing will be downloaded
error: Malformed or empty response from EuropePMC. Try running again. Perhaps your query is wrong.

When i run a query using PMCID, i get the following results

alex@alex-HP-ProDesk-600-G2-SFF:~$ getpapers -q PMCID:PMC5026053 -n -o maina
info: Searching using eupmc API
info: Running in no-execute mode, so nothing will be downloaded
info: Found 1 open access results

This document has listed PMID as a possible search field. Can getpapers search using PMIDS?

Thanks for this report

It seems at first glance that this is a problem with EuropePMC (which getpapers depends upon)

You'll see that if you search for PMID:27355041 at europepmc.org it comes up blank. Let me investigate further though.

It seems that at some point they stopped indexing by PMID and instead used the field EXT_ID (but this can also correspond to, for example, Agricola records)

This seems to work for me:
getpapers -q EXT_ID:27355041 -o test

Let me know if you have other problems

Thanks @tarrow it works very well but only where the PMID is for a paper that is Open access. Does this mean getpapers cannot mine titles and abstracts that are not open access in Pubmed/medline?

Sure; you just need to use the -a flag to also get non-open access content. Often there isn't a fulltext available for these though.

Thanks again.....Last question and by no means least. I have a dataset of 10 PMIDS

+------------------+
| accession_number |
+------------------+
| 27747646         |
| 27649863         |
| 27621978         |
| 27478298         |
| 27441216         |
| 27397933         |
| 27386033         |
| 27382606         |
| 27379288         |
| 27355041         |
+------------------+

How can i query 10 PMIDS in a single script?

We don't currently have a way to do this; but we probably should.

The easiest thing would be to simply write a bash script to call getpapers several times. Are you on a unix machine?

I made a new issue for this: #148

Yes I am on a unix machine...is this the same for PMCIDs?