biothings / mygene.info

MyGene.info: A BioThings API for gene annotations

Home Page:http://mygene.info

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mouse genes cannot be queried by ensembl ID

cconow opened this issue · comments

querymany(["Eef2", "ENSMUSG00000034994", "ENSG00000167658"], species="mouse,human", scopes="_id,symbol,ensemblgene", fields="ensembl,symbol,genomic_pos", returnall=True)

Returns

{
  "out": [
    {
      "query": "Eef2",
      "_id": "1938",
      "_score": 17.811184,
      "ensembl": {
        "gene": "ENSG00000167658",
        "protein": [
          "ENSP00000307940",
          "ENSP00000471265"
        ],
        "transcript": [
          "ENST00000309311",
          "ENST00000594885",
          "ENST00000596417",
          "ENST00000598182",
          "ENST00000598436",
          "ENST00000600720",
          "ENST00000600794"
        ],
        "translation": [
          {
            "protein": "ENSP00000471265",
            "rna": "ENST00000600794"
          },
          {
            "protein": "ENSP00000307940",
            "rna": "ENST00000309311"
          }
        ],
        "type_of_gene": "protein_coding"
      },
      "genomic_pos": {
        "chr": "19",
        "end": 3985463,
        "ensemblgene": "ENSG00000167658",
        "start": 3976056,
        "strand": -1
      },
      "symbol": "EEF2"
    },
    {
      "query": "Eef2",
      "_id": "13629",
      "_score": 14.938413,
      "genomic_pos": {
        "chr": "10",
        "end": 81018332,
        "ensemblgene": "ENSMUSG00000034994",
        "start": 81012465,
        "strand": 1
      },
      "symbol": "Eef2"
    },
    {
      "query": "ENSMUSG00000034994",
      "notfound": true
    },
    {
      "query": "ENSG00000167658",
      "_id": "1938",
      "_score": 26.22843,
      "ensembl": {
        "gene": "ENSG00000167658",
        "protein": [
          "ENSP00000307940",
          "ENSP00000471265"
        ],
        "transcript": [
          "ENST00000309311",
          "ENST00000594885",
          "ENST00000596417",
          "ENST00000598182",
          "ENST00000598436",
          "ENST00000600720",
          "ENST00000600794"
        ],
        "translation": [
          {
            "protein": "ENSP00000471265",
            "rna": "ENST00000600794"
          },
          {
            "protein": "ENSP00000307940",
            "rna": "ENST00000309311"
          }
        ],
        "type_of_gene": "protein_coding"
      },
      "genomic_pos": {
        "chr": "19",
        "end": 3985463,
        "ensemblgene": "ENSG00000167658",
        "start": 3976056,
        "strand": -1
      },
      "symbol": "EEF2"
    }
  ],
  "dup": [
    [
      "Eef2",
      2
    ]
  ],
  "missing": [
    "ENSMUSG00000034994"
  ]
}

Which shows that querying human for eef2 returns information in the ensembl field, while mouse does not. Additionally, searching by the ID works for human but not for mouse. This is consistent across all genes I have tried. Interestingly, genomic_pos does still show the ensembl ID for mouse.

@cconow thanks for letting us know. I can confirm this issue from this mouse gene record:

https://mygene.info/v3/gene/13629?fields=ensembl

returns empty, which is supposed to return a matching ensembl field (e.g in this gene)

We had a similar issue for other genes too when updating data from the latest Ensembl v110 release recently, and we deployed a fix last week, but looks like there are still genes missing this fix. We will have a closer look and hope to fix them very soon.

I have deployed the fix to the Ensembl v110 release. Let me know if you have any problems. The links that were an issue are working now.

https://mygene.info/v3/gene/13629?fields=ensembl

https://mygene.info/v3/gene/ENSMUSG00000034994

I also want to add a note that we are adding additional data tests to cover more species like mouse, rat etc., in addition to the human genes we covered in our current test suite. This issue we had seems only impact mouse genes, not human genes, so our current test procedure did not catch it before the data release.