genomejs / gql

Genome query language

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Testing for deletions

pvjones opened this issue · comments

What would the appropriate gql method be for testing for a deletion? Forgive me if I just missed this, but I didn't see it in the documentation. From browsing dna2json, it appears that a deletion returns a null value for the rsid?

Thanks!

Does not(exists('rsid')) work for this case?

Possibly. I wasn't sure from the docs if not(exists('rsid')) tested for the absence of that rsid, or for the presence of a null value for that rsid property.

I think these mean different things in terms of results. Am I correct in thinking that dna2json should specifically return a null value for an extant rsid property in the case of a deletion?

@pvjones AFAIK a deletion should be in there as null, in which case gql.exact('rsid', null) should work - you might want to confirm

I think the logic for that function might need some work to make that work, since it early returns false if the genotype is null. Def possible though.

https://github.com/genomejs/gql/blob/master/index.js#L86 should probably read something like:

exact: function(k, v){
  if (typeof v !== 'string' && v !== null) {
    throw new Error('exact can only check for strings or null');
  }
  return function(data){
    var snp = data[k];
    if (!snp) return false
    if (snp.genotype === v) return true
    return roughMatch(snp.genotype, v);
  };
}

I think I'm starting to get a better picture of this now (gradually). I also should have read the other open issue, since it's discussing something directly related.

As it stands currently, it appears that FamilyTree reports deletions exclusively using (D;D) syntax. So the RSID for a double-deletion would be DD.

23andMe seems to report deletions with (D;D) syntax as well. However, I saw one example online that used (-;-). I'm not sure how much credibility to give that. I'm going to some more hunting.

If all three of the main vendors supported report deletoins as D, then the gql code is probably fine as-is. In that case, either exact('rsid': DD) or has('rsid': 'D') should work just fine. In that case maybe something to this effect could be added to the docs (I'd be happy to do so if that would be helpful).

If in fact 23andMe sometimes reports deletions as (-;-), then that might need to be dealt with somehow. I'm not sure if it would be better to do so in dna2json or gql.

Thanks again!

@pvjones We should normalize all deletions to null in dna2json

@pvjones This is already accounted for in the 23AndMe parser https://github.com/genomejs/dna2json/blob/master/lib/parsers/23andMe.js#L13 might need to be added to the others

Awesome! Another question though. This may not address heterozygous deletions, which if I understand correctly are uncommon, but possible. This would take the form of (A;-) or (A;D). In those cases you may not want to return the entire rsid property as null, since the presence of one allele is still informative?

Just published the GQL changes at 1.1.1, I'll do the dna2json ones next

Awesome, thanks again for all your great work!

@pvjones Wait, when you say single deletion - maybe I'm wrong on the impl.

Is it possible to have a deletion for one allele but not for the other? Or is it a constant that both will always be a deletion.

If you can have one deletion but not 2 then we should revert all the null stuff and represent it as -

Yes, heterozygous deletions are possible, and are indications for certain genotypes. For example in genes related to Familial Hypercholesterolemia, you can have both single or double deletions. (See below:)
rs137853966
(G;G) = normal
(G;-) = known mutation associated with FH
(-;-) = known mutation associated with FH
http://snpedia.com/index.php/Rs137853966
Again I'm not sure how common it is, but it does happen. I'm pretty sure deletions can also happen with insertions, which I believe would be reported in a txt file as DI or ID.

Sorry for the circuitous route here. I'm still figuring a lot of this out. But considering that perhaps it would be better if all deletion character codes from the vendors were normalized to either D or - for each allele?

That's one of the rsid's that tuned me in to this issue. I was trying to write a custom gql 'genoset' for Familial Hypercholesterolemia, and quickly realized that I had no idea how to do so for the deletions/insertions.

@pvjones Yeah, going to normalize them to -

Published as genome.js 1.0.2, gql 1.1.2

gql.exact('rsid', '-A') / gql.has('rsid', '-') etc. - deletions work like a normal thing now