- Processing
- Training
- Inference
python processing/label_output.py \
--input {location of the label-studio output json files} \
--label_config {configuration used to set up label-studio; xml file} \
--label all OR --keep goals_or care
--hpi \
--stratified_split 0.3 \
--test
- Without
--test
argument, data will be stratified split to train/valid 0.7/0.3 - With
--test
argument, data will be stratified split to train/valid/test 0.7/0.15/0.15 - It takes around 17s to load the spacy
en_core_sci_lg
model, please wait.
- Transformer model choices: 'bert', 'xlnet', 'roberta', 'xlm-roberta', 'camembert', 'distilbert', 'electra'
conda activate transformers
python ner.py \
--dset {location of the data that has been converted to ConLL format} \
--model_class electra \
--pretrained_model google/electra-base-discriminator \
--lr 6e-5 \
--decay 0.02 \
--warmups 500
- Bayesian optimization with Gaussian processes
- Please open the interactive plots (contour_plot, slice_plot, cv_plot, etc) in browser
python optimization.py \
--model bert \
--lr 1e-6 1e-4 \
--decay 0.01 0.1 \
--warmups 0 3000 \
--eps 1e-9 1e-7
python processing/model_output.py \
--model_output processing/output/symptoms_hpi_all/prediction_test.txt \
--label_output_dir symptoms/storage/label-studio/project/completions/ \
--label_config symptoms/storage/label-studio/project/config.xml
Use raw csv files with a column containing clinical note - no need to convert into ConLL format.
python inference/run_and_predict.py -ipf {location of the input file} -opf {location of dummy output file} -cn {name of the column containing the clinical note}
All codes are modified from