biothings / mygene.info

MyGene.info: A BioThings API for gene annotations

Home Page:http://mygene.info

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NCBI genes map to ensembl genes with invalid identifiers

dhimmel opened this issue · comments

I've noticed three genes where the value for ensembl.gene does not begin with ENSG:

https://mygene.info/v3/gene/263?fields=ensembl
ensembl.gene appears to actually be ENSG00000237801
{"_id": "263", "_version": 1, "ensembl": {"gene": "263", "transcript": "263-1", "translation": [], "type_of_gene": "rRNA"}}

https://mygene.info/v3/gene/55872?fields=ensembl
ensembl.gene appears to actually be ENSG00000168078
{"_id": "55872", "_version": 3, "ensembl": {"gene": "55872", "transcript": "55872-1", "translation": [], "type_of_gene": "tRNA"}}

https://mygene.info/v3/gene/126231?fields=ensembl
ensembl.gene appears to actually be ENSG00000189144
{"_id": "126231", "_version": 2, "ensembl": {"gene": "126231", "transcript": "126231-1", "translation": [], "type_of_gene": "tRNA"}}

In these cases, it seems the value for ensembl.gene has been set to entrezgene (the ncbigene id). Any ideas what the problem is?

This issue is introduced when we're integrating Metazoa Species data from Ensembl through BioMart.

File path: ensembl_metazoa/49/gene_ensembl__gene__main.txt
text based search: awk '$2 == "263" { print $0 }' gene_ensembl__gene__main.txt
returns: 27923 263 rns 3153 3520 Mt 1 rRNA

And since no entrezgene id can be mapped to it. We use it as the _id. And it accidentally aligns with the genedoc with _id:263 from entrez for human species.