bert bert-fine-tuning biomedical-text-mining deep-learning nlp-machine-learning relation-extraction text-mining

K-RET: Knowledgeable Biomedical Relation Extraction System

K-RET is a flexible biomedical RE system, allowing for the use of any pre-trained BERT-based system (e.g., SciBERT and BioBERT) to inject knowledge in the form of knowledge graphs from a single source or multiple sources simultaneously. This knowledge can be applied to various contextualizing tokens or just to the tokens of the candidate relation for single and multi-token entities.

Our academic paper which describes K-RET in detail can be found here.

The uer folder corresponds to an updated version of the toolkit developed by Zhao et al. (2019) available here.

Downloading Pre-Trained Models

You should use both a baseline model and one of our pre-trained models to make predictions on new data. If you wish to train a new model on your data, you only need a baseline model, which can be either model referenced in our academic paper.

Baseline Models

After downloading a baseline model, for instance SciBERT, the model needs to be converted using the uer toolkit. For this, you can run the following example, making the necessary adaptations given different baseline models or paths.

cd K-RET/uer/
python3 convert_bert_from_huggingface_to_uer.py --input_model_path ../models/pre_trained_model_scibert/scibert_scivocab_uncased/pytorch_model.bin --output_model_path ../models/pre_trained_model_scibert/output_model.bin

Our Models

Available versions of the best performing pre-trained models are as follows:

The training details are described in our academic paper.

Getting Started

Our project includes code adaption of the K-BERT model available here. Use the K-RET Image available at Docker Hub to set up the rest of the experimental environment.

Usage Example

 CUDA_VISIBLE_DEVICES='1,2,3' python3 -u run_classification.py \
    --pretrained_model_path ./models/pre_trained_model_scibert/output_model.bin \
    --config_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/config.json \
    --vocab_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/vocab.txt \
    --train_path ./datasets/ddi_corpus/train.tsv \
    --dev_path ./datasets/ddi_corpus/dev.tsv \
    --test_path ./datasets/ddi_corpus/test.tsv \
    --class_weights True \
    --weights "[0.234, 3.377, 4.234, 6.535, 24.613]" \
    --epochs_num 30 \
    --batch_size 32 \
    --kg_name "['ChEBI']" \
    --output_model_path ./outputs/scibert_ddi.bin | tee ./outputs/scibert_ddi.log &

For more options check run.sh and, for additional configuration settings (e.g., max_number_entities and contextual_knowledge), check brain/config.py.

Predict New Data Example

CUDA_VISIBLE_DEVICES='0' python3 -u run_classification.py \
    --pretrained_model_path ./models/pre_trained_model_scibert/output_model.bin \
    --config_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/config.json \
    --vocab_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/vocab.txt \
    --train_path ./datasets/ddi_corpus/train.tsv \
    --dev_path ./datasets/ddi_corpus/dev.tsv \
    --class_weights True \
    --weights "[0.234, 3.377, 4.234, 6.535, 24.613]" \
    --test_path ./datasets/ddi_corpus/test.tsv \
    --epochs_num 30 --batch_size 32 --kg_name "[]" \
    --testing True \
    --to_test_model ./outputs/scibert_ddi.bin \
    | tee ./outputs/ddi_results.log &

Process Results Example

python3 src/process_results.py ./outputs/ddi_results.log ./datasets/ddi_corpus/test.tsv ddi_results.tsv

Reference

Diana Sousa and Francisco M. Couto. 2022. K-RET: Knowledgeable Biomedical Relation Extraction System. Bioinformatics.

About

K-RET: Knowledgeable Biomedical Relation Extraction System

bert bert-fine-tuning biomedical-text-mining deep-learning nlp-machine-learning relation-extraction text-mining

Apache License 2.0

Languages

Language:Python 99.0%Language:Shell 1.0%