genomejs / gql

Genome query language

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Beginner-level issue implementing gql

pvjones opened this issue · comments

Hi, first off I want to say thanks for all the amazing work that has gone into these projects (specifically gql and dna2json). I'm just starting to learn JavaScript and NodeJS. I'd like to see if I can implement a simple app using these tools. If so, then down the road (as I learn more) I hope to put together a genoset-checking and SNP-mapping tool for a project I'm working on.

That was a long introduction -- TLDR; please forgive the neophyte-level questions.

I'm doing something wrong, in trying to follow along with the examples, because nodemon is logging 'TypeError: gql.query is not a function,' where I invoke it in a 'genoset' file.

checkSNPs.js

`const fs = require('fs');
const path = require('path');
const es = require('event-stream');
const JSONStream = require('JSONStream');

exports.forGenotype = (genosetFile, jsonFile) => {
let genosetDefinition = require(./queries/${genosetFile})
let query = genosetDefinition();
let jsonReadStream = fs.createReadStream(jsonFile)
let genoStream = query.stream();

jsonReadStream
.pipe(JSONStream.parse('*'))
.pipe(genoStream);

let count = 0;

genoStream.on('data'),
function(snp) {
console.log('Analyzed ', ++count, ' SNPs')
}

genoStream.on('end', function() {
console.log("There are", query.matches().length, "matches for genoset 228");
console.log("There is a", query.percentage(), "percent chance that genoset matches");
});
}`

And here's my 'genoset definition' file, isSickleCellAffected.js

`const gql = require('gql');

module.exports = function() {
let query = gql.query();
query.needs(0);
query.or(query.exact('rs334', 'TT'), query.exact('i3003137', 'AA'));
return query;
};`

Hi, I think I may have figured out what was wrong with my isSickleCellAffected.js file. I need to instead create the query method there, constructed of gql methods, and then export it, like this:

const gql = require('gql');

module.exports = function() {
let query = gql.or([
gql.exact('rs334', 'TT'),
gql.exact('i3003137', 'AA')
]);
return query;
};

However, after doing this I instead get a TypeError saying query.stream() is not a function.

@pvjones Not sure what version you're using, but all of the streaming stuff got removed a long time ago - you should check out the new API docs. It's more performant to load the SNP-JSON ahead of time as an object than to stream it through, so the format of the JSON and querying was changed.

So you would basically do this:

main.js:

const dna = require('./some-genome.json')
const query = require('./isSickleCellAffected')

const matches = query(dna) // true or false

isSickleCellAffected.js:

const gql = require('gql')

module.exports = gql.or([
  gql.exact('rs334', 'TT'),
  gql.exact('i3003137', 'AA')
]);

GQL is a light querying language that basically just does object checks. If you have any advanced logic (percentages, etc.) you can check the object yourself.

You could rewrite the same thing as:

dna.rs334 === 'TT' || dna.i3003137 === 'AA'

Hey, I can't thank you enough for the help! This works like a charm now.
Next I'm going to work on a percentage-based query. If I can get this figured out, I'd like to make a bunch of these genoset files. I may (with the help of a colleague) attempt to scrape SNPedia and automate the creation of many more. If this ends up working out, would I be able to submit these for the community somewhere?

Thanks again!

@pvjones Yep! You can publish them on NPM as genoset-* and tag them like so (most important is the genoset ID and genoset)

Here's the full boilerplate for a genoset module: https://github.com/genomejs/genoset-boilerplate - I'll make sure it's up to date with the latest additions now.

I'd love to get all of SNPedia published as modules that can be iterated on and used in cool projects.

Actually no need for the boilerplate repo (deleted it), you can use this as a reference if needed: https://github.com/genomejs/genoset-norovirus