tmVar Variant Normalization Instructions

One of the key innovations described in the publication tmVar: A text mining approach for extracting sequence variants in biomedical literature is a method of normalizing extracted variant mentions to unique identifiers (dbSNP RSIDs). However it is unclear how this feature can be used and running the tmVar model out of the box does not produce this behaviour. To normalize extracted variants GNormPlus must first be run on the input data and the results of this must be fed into tmVar.

Getting Started

Download GNormPlus from the NCBI's website and decompess the folder.
Install tmVar from the NCBI's website and extract it into the same directory as GNormPlus.

Directory Structure

project
│
└─── gnormplus_input
└─── gnormplus_output
└─── tmvar_output
│
└───tmVar
│   │   corpus
│   │   CRF
│        ...
│   
└───GNormPlus
    │   Corpus
    │   CRF
        ...

Example Run

java -Xmx10G -Xms10G -jar tmVar.jar gnormplus_input gnormplus_output
java -Xmx10G -Xms10G -jar GNormPlus.jar gnormplus_output tmvar_output setup.txt

Acknowledgments

Chih-Hsuan Wei for clarifying this process.

r-tinn / tmvar

tmVar Variant Normalization Instructions

Getting Started

Directory Structure

Example Run

Acknowledgments

About