BERTtextclassifier

BERTtextclassifier is a text classifier based on pre-trained BERT model.

Requirement

python3
torch==1.6.0
pandas
matplotlib
transformers==3.0.0
sklearn
seaborn

Usage

./BERTtextclassifier.sh [-i infile]
                        [-m pretrained_model]
                        <-o model_outdir>
                        <-l max_length>
                        <-b batch_size>
                        <-n num_epochs>
                        <-r learning_rate>

-i: filename of raw input text. Input text shoud be in TSV format with 4 colomns as follows:
IDtitlecontentlabel
labels should be intergers.
-m: pre-trained BERT model.
-o: directory for output post-trained model. Default=./model
-l: max length of sequences. Default=150
-b: batch size for training, valid and test data. Default=4
-n: number of epochs to use. Default=5
-r: learning rate of model. Default=2e-5

Pre-trained Model

Pre-trained BERT model should be download first for training.
Example
To download bioBERT pretrained model. use folloing command:

git lfs install
git clone https://huggingface.co/dmis-lab/biobert-v1.1

Then pre-trained bioBERT model will be download in ./biobert-v1.1.

Example

With downloaded bioBERT model, use following command to test:

./BERTtextclassifier.sh -i testdata/test.tsv -m biobert-v1.1

About

Do text classification using pre-trained BERT model.

GNU General Public License v3.0

Languages

Language:Python 88.8%Language:Shell 11.2%