BERTtextclassifier is a text classifier based on pre-trained BERT model.
python3
torch==1.6.0
pandas
matplotlib
transformers==3.0.0
sklearn
seaborn
./BERTtextclassifier.sh [-i infile]
[-m pretrained_model]
<-o model_outdir>
<-l max_length>
<-b batch_size>
<-n num_epochs>
<-r learning_rate>
-i: filename of raw input text. Input text shoud be in TSV format with 4 colomns as follows:
IDtitlecontentlabel
labels should be intergers.
-m: pre-trained BERT model.
-o: directory for output post-trained model. Default=./model
-l: max length of sequences. Default=150
-b: batch size for training, valid and test data. Default=4
-n: number of epochs to use. Default=5
-r: learning rate of model. Default=2e-5
Pre-trained BERT model should be download first for training.
Example
To download bioBERT pretrained model. use folloing command:
git lfs install
git clone https://huggingface.co/dmis-lab/biobert-v1.1
Then pre-trained bioBERT model will be download in ./biobert-v1.1.
With downloaded bioBERT model, use following command to test:
./BERTtextclassifier.sh -i testdata/test.tsv -m biobert-v1.1