biothings / mygene.info

MyGene.info: A BioThings API for gene annotations

Home Page:http://mygene.info

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create a new NCBI data source to get complete gene summary from ASN dump

newgene opened this issue · comments

The current gene summary data (summary field) from MyGene.info API are extracted from the RefSeq records (see the current refseq data source).

It appears that Refseq does not contain all gene summary text available from NCBI. For example, reported in #129, gene POLA2 contains a summary text which is not available from its RefSeq record, therefore it's missing from the current MyGene.info API.

As suggested by the NCBI support team (Case #: CAS-941135-X3W9H8 for the record), the complete gene summary text are available from NCBI's ASN1 binary dump files. We can create a new ncbi_gene data source based on ASN1 binary dump files to extract gene summary text.