cokelaer / bioservices

Access to Biological Web Services from Python.

Home Page:http://bioservices.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Limit of 25 proteins for UniProt mapping.

ArnaudBelcour opened this issue · comments

Hello,

First thank you for this package, it is very useful.

I have an issue when using UniProt with bioservices 1.10.4 on Python 3.8.10. I try to retrieve protein annotations from UniProt using the mapping function:

from bioservices import UniProt
uniprot_bioservices = UniProt(verbose=False)

protein_queries = ['Q89B22', 'P57224', 'Q89B11', 'P57601', 'P25749', 'P59526', 'P59491', \
        'P57659', 'P57263', 'P57337', 'P57411', 'Q89A85', 'P57576', 'Q89AJ8',\
        'P57524', 'Q89AK2', 'Q89B42', 'P57362', 'P59460', 'P57559', 'P57226', \
        'P57463', 'P57529', 'P57213', 'P57525', 'P57230']

data = uniprot_bioservices.mapping(fr='UniProtKB_AC-ID', to='UniProtKB', query=protein_queries)

print('Number of input proteins: {0}'.format(len(protein_queries)))
print('Number of mapped proteins: {0}'.format(len([i['from'] for i in data['results']])))
if 'failedIds' not in data:
    print('Number of failed mapped proteins: 0')
else:
    print('Number of failed mapped proteins: {0}'.format(len([i['from'] for i in data['failedIds']])))

There are 26 proteins that are given as a mapping query. But when I check the results, there are only 25 proteins even if the last protein has annotations on UniProt. I have tested with other number of proteins and it seems that there is a limit of 25 results for this mapping function.

But I do not find any information about it. Is it an error on my side with an option that I have missed?

I have also had this issue using u.search(). Was trying to query 200 proteins, but the results were always 25 proteins.

prot_search = u.search(query="+OR+".join(proteins), columns='id,annotation_score,lineage', frmt='tsv')

Using python 3.8.11 and bioservices 1.10.4

Also, I used u.search() because u.get_df() doesn't allow specific columns.

Thanks.

@joaosegurilho sorry for the later answer. I believe that in your case, using the parameter limit should help. For instance for 200 proteins, call the method with limit=200; if unsure, you can set a limit that is large enough e.g. 1000. :

prot_search = u.search(query="+OR+".join(proteins), columns='id,annotation_score,lineage', frmt='tsv', limit=200)

by default all uniprot calls have a limit of 25 results according to their API. I'll try to update the code to make it automatic in the future.

@ArnaudBelcour looks like there is also a limit of 25 on the mapping functionality but here I did not manage to implement the limit parameter. Not sure whether it is a bug in the uniprot API or not. So, the only solution for now is to split the input list in chunk of 25. similarly to the previous comment made here, I will try to update the code to make this process automatic in the future release.
best

Should be fixed in v1.11.0 now available on pypi.