ajratner / truc

The Table Relation Understanding Component (TRUC)- relation extraction from tables

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GV-P supervision?

ajratner opened this issue · comments

Via OMIM? Or a different set? Emailed Aaron & Karthik

Aaron's email:

IDs:

  • HPO term IDs for phenotypes
  • HGVS nomenclature (the "GJA1:c.50A>C,p.Y17S" convention) for variants

There are two problems with us providing data like that to you, both of which are easily overcome.

  1. we do not have a database of OMIM variants. Karthik is working on a parser to structure OMIM variant data now, but it is not done. Luckily, ClinVar has mapped much of OMIM. It's lower quality and coverage than what Karthik will do, but it should suffice for training.
  2. OMIM mappings (and most all variant mappings) are from variant to disease (OMIM ID), not phenotypic trait (HPO ID). That is overcome using the "Clinical Synopsis" in OMIM, which maps a disease (OMIM ID) to its constituent phenotypes (HPO ID). Here's an example; go to "Display Options > Display Clinical IDs" to see the HPO terms. The structured clinical synopses are available through the OMIM API.

Note that what we really need to extract is (G, V, P) or (G, V, D) relations...

See code & comments in dd-genomics

Done, code in dd-genomics