biothings / mygene.info

MyGene.info: A BioThings API for gene annotations

Home Page:http://mygene.info

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

search_phase_execution_exception

Maarten-vd-Sande opened this issue · comments

Thanks for the awesome tool! 💪

I have a transcription factor database and of each TF they have either a gene name or some other identifier. Currently I am stuck with a list of different identifiers of which I am not entirely sure what they mean, and I would like to convert them to gene names if possible. Here are some examples:

AL662824.5
CR759733.4
KIAA0415
BX284686.1

If for instance I google BX284686.1 I get to this page: https://www.ncbi.nlm.nih.gov/nuccore/BX284686

Which made me think that BX284686.1 belongs to the field accession.genomic, however when I query: http://mygene.info/v3/query?q=accession.genomic:AL662824.5 I get the error: search_phase_execution_exception.

  • Am I using the wrong field?
  • Do you maybe know what these identifiers are/represent?

Thanks!

@Maarten-vd-Sande You're using the right query syntax. However, the reason it fails is because accession.genomic field is not indexed, so you could not query by this field. You could check which field is indexed in MyGene.info at here. Though I'm not sure why this field is set to be not indexed.

Ok, unfortunate.

Do you have any experience with these identifiers I have, and what they represent? Is accession.genomic actually the correct field?

@Maarten-vd-Sande it doesn't look like the identifiers in your small sample are of a consistent type. My guess is that accession.genomic would be the correct field for most of them, but KIAA0415 is something else (maybe alias?)

@Maarten-vd-Sande you may find querying on another field called other_names might be useful in your case:

http://mygene.info/v3/query?q=other_names:KIAA0415&fields=entrezgene,name,symbol,taxid,other_names

(this can be the last try if you cannot find any match via other more precise identifier fields)

Also want to add that the reason we do not offer indexed accession.genomic field is because the value of this field typically a large DNA sequence (e.g. human chromosome 1). It's useful to know which genomic sequence a gene belongs to via this field, but not that of interest to the reversed direction, querying for all genes in chr1. The returned hits will be huge and often not very useful.

Thanks @kevinxin90 , @gtsueng and @newgene for your replies. This has been very helpful. 🥇