ohsu-comp-bio / g2p-aggregator

Associations of genomic features, drugs and diseases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

brca exchange "Pathogenicity_expert" filter

bwalsh opened this issue · comments

commented

The possible values:

"Benign / Little Clinical Significance"
"Likely benign"
"Not Yet Reviewed"
"Pathogenic"
"Uncertain"

We currently filter out "Not Yet Reviewed".
We discussed removing that filter, can you confirm?

@bwalsh That is my opinion. Happy to have @ahwagner @malachig and others chime in as well.

commented

@jgoecks

Made the change - brca now has 17,584 items ( was 5,715 )
One note: unreviewed items do not have a phenotype

+++ b/harvester/brca.py
@@ -22,10 +22,10 @@ def harvest(genes=None):
         else:
             page_num = page_num + 1
             for record in payload['data']:
-                if not record['Pathogenicity_expert'] == 'Not Yet Reviewed':
-                    gene = record['Gene_Symbol']
-                    gene_data = {'gene': gene, 'brca': record}
-                    yield gene_data
+                # if not record['Pathogenicity_expert'] == 'Not Yet Reviewed':
+                gene = record['Gene_Symbol']
+                gene_data = {'gene': gene, 'brca': record}
+                yield gene_data

@bwalsh I get slightly different results when I run this.
source:brca: 17,546
source:brca AND exists:association.phenotype.description: 5,791

Whereas g2p-test shows a slightly higher overall result from BRCA and a slightly lower count with phenotype association. Is this just related to when the harvest was run do you think?

Code looks good, just confused about number variation.

commented

I'll check g2p-test tomorrow ( there were snafus uploading to it )

Per the group discussion today, we're reversing course on this and should exclude "Not Yet Reviewed" variants.

commented

@ahwagner @mayfielg @jgoecks for your review... addressed and deployed at https://g2p-test.ddns.net