e-commerce-search / bert2dnn

Large Scale BERT Distillation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bert2dnn

Large Scale BERT Distillation

Code for paper "BERT2DNN: BERT Distillation with MassiveUnlabeled Data for Online E-Commerce Search"

TODOs

  • BERT2DNN model implement
  • SST/amazon data pipeline
  • BERT/ERNIE finetune

Requirements

  • Python 3
  • Tensorflow 1.15

Quickstart:

Traing data

SST-2 dataset is in a tab-seperated format:

sentence Label
hide new secretions from the parental units 0

After fine-tuning BERT/ERNIE with this data, we obtain the teacher model, which could be used to predict scores on the transfer dat aset.

sentence Label logits prob prob_t2
hide new secretions from the parental units 0 -1.2881309986114502 0.024137031017202534 0.13589785133992555

This script will generate TF examples containing pair of text and label for training. The text is already tokenized with unigram and bigram tokenizer. The label is a soft target with a selected temperature.

python gen_tfrecord.py \
--input_file INPUT_TSV_FILE \
--output_file OUTPUT_TFRECORD \
--idx_text 0 --idx_label 3

Model training

python run.py --do_train True --do_eval True

Transfer Dataset

Our experiment use two public datasets:

  1. Stanford Sentiment Treebank: SST-2 download
  2. Amazon review dataset: download

About

Large Scale BERT Distillation


Languages

Language:Python 100.0%