bert2dnn
Large Scale BERT Distillation
Code for paper "BERT2DNN: BERT Distillation with MassiveUnlabeled Data for Online E-Commerce Search"
TODOs
- BERT2DNN model implement
- SST/amazon data pipeline
- BERT/ERNIE finetune
Requirements
- Python 3
- Tensorflow 1.15
Quickstart:
Traing data
SST-2 dataset is in a tab-seperated format:
sentence | Label |
---|---|
hide new secretions from the parental units | 0 |
After fine-tuning BERT/ERNIE with this data, we obtain the teacher model, which could be used to predict scores on the transfer dat aset.
sentence | Label | logits | prob | prob_t2 |
---|---|---|---|---|
hide new secretions from the parental units | 0 | -1.2881309986114502 | 0.024137031017202534 | 0.13589785133992555 |
This script will generate TF examples containing pair of text and label for training. The text is already tokenized with unigram and bigram tokenizer. The label is a soft target with a selected temperature.
python gen_tfrecord.py \
--input_file INPUT_TSV_FILE \
--output_file OUTPUT_TFRECORD \
--idx_text 0 --idx_label 3
Model training
python run.py --do_train True --do_eval True
Transfer Dataset
Our experiment use two public datasets: