genomejs / gql

Genome query language

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can this be used to look for genes?

marcucio opened this issue · comments

commented

Can this be used to look for a particular gene? For example I have this gene I am looking for:

http://www.snpedia.com/index.php/KCNQ2

How would I go about writing a query to see if this gene exists?

@marcucio You would look up each SNP associated with the gene - there's 136 of them (found in the sidebar on the snpedia page you linked).

And when you're done, publish it as a module!

You might not need gql for this since there is no conditional logic. The genome.js format is a JS object where the key is the SNP RS ID and the value is the SNP value, so if you're trying to extract the sequence you can do:

const getKCNQ2 = (dna) =>
  dna.rs118192185 + dna.rs118192186 + (and so on)

I think it would be great to have a meta-module that was each gene and it's assosciated SNPs - let me know if you're interested in helping out with that.

It could be as simple as:

{
  "KCNQ2": [ array of rsids ]
}

Then you can iterate over them to extract sequences

commented

Sorry I'm super new to this gene and dna area so I don't fully understand.

I agree it would be nice to just define it in an array if there are no conditions. Ideally there should be a repo somewhere defining all the genosets and genes in the format that can be used by gql. I found 20 or so genosets defined in gql but I'm guessing there isn't a repo out there defining more.

Do the SNPs associated with the gene have to be in a particular order in order to find if the gene exists? Or if I can find all the SNPs for the gene in the dna object then we know the gene exists?

@marcucio Yep, if you have any genosets you are passionate about you should publish them as modules! It's really easy to do.

I think for this gene, if all of the SNPs exist the gene exists. Not sure if there are certain values to look for, the 23andme page might be more helpful.

If it's an existence check you can do this:

module.exports = gql.and([
  gql.exists('rswhatever'),
  gql.exists('rswhatever'),
  ... and so on
])

or given an array of rsids

module.exports = gql.and(rsids.map((id) => gql.exists(id)))

then you can hasKCNQ2(dna) // true or false

commented

Ok I understand, thanks! I will play around with it some more. I was thinking of creating 1 repo with all the SNP and gene defines, I would like to ideally check for 200+ genes so I think it would be better as 200 modules in a repo instead of 200 different repos.

I was thinking of writing a script to convert the JSON data from snpedia [http://www.snpedia.com/index.php?title=Special:Ask&offset=0&limit=500&q=%5B%5BCategory%3AIs+a+snp%5D%5D+%5B%5BIn+gene%3A%3AKCNQ2%5D%5D&p=mainlabel%3D%2Fformat%3Dtable&po=%3FMax+Magnitude%0A%3FChromosome+position%0A%3FSummary%0A] into a format we can use, maybe save to a JSON array like you previously suggested.

I was also thinking of changing the format of the SNP modules to better fit my needs. It currently looks something like this:

var gql = require('gql');

module.exports = gql.and([
    gql.or([gql.exact('rs6311', 'C'), gql.exact('rs6311', 'CT')]),
    gql.or([gql.exact('rs1328674', 'A'), gql.exact('rs1328674', 'AG')]),
    gql.or([gql.exact('rs6313', 'C'), gql.exact('rs6313', 'CT')]),
    gql.or([gql.exact('rs6314', 'G'), gql.exact('rs6314', 'AG')])
])

But I think it might be helpful to have more info, maybe define it like this (not tested btw!) so that we can programmatically get the description and what a match might mean (just brainstorming):

(function(exports){
    exports.exists = gql.and([
        gql.or([gql.exact('rs6311', 'C'), gql.exact('rs6311', 'CT')]),
        gql.or([gql.exact('rs1328674', 'A'), gql.exact('rs1328674', 'AG')]),
        gql.or([gql.exact('rs6313', 'C'), gql.exact('rs6313', 'CT')]),
        gql.or([gql.exact('rs6314', 'G'), gql.exact('rs6314', 'AG')])
    ]);

    exports.interpreter = function (exists) {
        if (exists) {
            return 'Description of what this exists means';
        }
        return 'Description of what it means that this dosen\'t exist';
    };

    exports.descrition = 'Description of SNP or gene';

})(exports);

BTW, feel free to close this if you want, you answered my initial question.

@marcucio Yeah, there isn't really a "spec" of what the modules should look like per se, as long as it works off the dna data it can be considered compatible with any other genome.js module - doesn't even have to use gql. LMK what you end up publishing.