GV-P supervision?
ajratner opened this issue · comments
Alex Ratner commented
Via OMIM? Or a different set? Emailed Aaron & Karthik
Alex Ratner commented
Aaron's email:
IDs:
- HPO term IDs for phenotypes
- HGVS nomenclature (the "GJA1:c.50A>C,p.Y17S" convention) for variants
There are two problems with us providing data like that to you, both of which are easily overcome.
- we do not have a database of OMIM variants. Karthik is working on a parser to structure OMIM variant data now, but it is not done. Luckily, ClinVar has mapped much of OMIM. It's lower quality and coverage than what Karthik will do, but it should suffice for training.
- OMIM mappings (and most all variant mappings) are from variant to disease (OMIM ID), not phenotypic trait (HPO ID). That is overcome using the "Clinical Synopsis" in OMIM, which maps a disease (OMIM ID) to its constituent phenotypes (HPO ID). Here's an example; go to "Display Options > Display Clinical IDs" to see the HPO terms. The structured clinical synopses are available through the OMIM API.
Alex Ratner commented
Note that what we really need to extract is (G, V, P) or (G, V, D) relations...
See code & comments in dd-genomics
Alex Ratner commented
Done, code in dd-genomics