How to use pe-train model which is provided by on google bert repository?

Question

How to use pe-train model which is provided by on google bert repository?

tripbnb66 opened this issue 3 years ago · comments

On the https://github.com/google-research/bert#pre-trained-models page, people can download
BERT-Base, Multilingual Cased (New, recommended): 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
After extracting data from the zip file, there are 5 files in the zip files.
I expected to use the pre-train model which is provided by google for Multilingual training.

But the file names are totally different to files which is provided in this repositry

bert_config.json
bert_model.ckpt.data-00000-of-00001
bert_model.ckpt.index
bert_model.ckpt.meta
vocab.txt

In this repository, there are 3 files and they are

bert-base-chinese-pytorch_model.bin
bert-base-chinese-vocab.txt
bert-base-chinese-config.json

I tried to use bert_model.ckpt.data-00000-of-00001 to replace the bert-base-chinese-pytorch_model.bin but it doesn't work and I got an error message:
_pickle.UnpicklingError: invalid load key, '\x27'.

The version of this repository that I used is "14caa98" and I used "git reset --hard 14caa98" to make sure I used the right version.

tripbnb66 · Answer 1 · Mon Jan 18 2021 16:22:15 GMT+0800 (China Standard Time)

check https://modelzoo.co/model/pytorch-pretrained-bert and you will find the answer.
complete steps are listed below:

wget "https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip"
unzip multi_cased_L-12_H-768_A-12.zip
pip3 install pytorch-pretrained-bert
export BERT_BASE_DIR=/home/david/bot/cron/bert/multi_cased_L-12_H-768_A-12
pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch $BERT_BASE_DIR/bert_model.ckpt $BERT_BASE_DIR/bert_config.json $BERT_BASE_DIR/pytorch_model.bin
cp -f $BERT_BASE_DIR/pytorch_model.bin bert_pytorch/pybert/pretrain/bert/base-uncased/pytorch_model.bin
cp -f $BERT_BASE_DIR/vocab.txt bert_pytorch/pybert/pretrain/bert/base-uncased/bert_vocab.txt
cp -f $BERT_BASE_DIR/bert_config.json bert_pytorch/pybert/pretrain/bert/base-uncased/config.json