How to use pe-train model which is provided by on google bert repository?
tripbnb66 opened this issue · comments
On the https://github.com/google-research/bert#pre-trained-models page, people can download
BERT-Base, Multilingual Cased (New, recommended): 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
After extracting data from the zip file, there are 5 files in the zip files.
I expected to use the pre-train model which is provided by google for Multilingual training.
But the file names are totally different to files which is provided in this repositry
bert_config.json
bert_model.ckpt.data-00000-of-00001
bert_model.ckpt.index
bert_model.ckpt.meta
vocab.txt
In this repository, there are 3 files and they are
bert-base-chinese-pytorch_model.bin
bert-base-chinese-vocab.txt
bert-base-chinese-config.json
I tried to use bert_model.ckpt.data-00000-of-00001 to replace the bert-base-chinese-pytorch_model.bin but it doesn't work and I got an error message:
_pickle.UnpicklingError: invalid load key, '\x27'.
The version of this repository that I used is "14caa98" and I used "git reset --hard 14caa98" to make sure I used the right version.
check https://modelzoo.co/model/pytorch-pretrained-bert and you will find the answer.
complete steps are listed below:
- wget "https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip"
- unzip multi_cased_L-12_H-768_A-12.zip
- pip3 install pytorch-pretrained-bert
- export BERT_BASE_DIR=/home/david/bot/cron/bert/multi_cased_L-12_H-768_A-12
- pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch $BERT_BASE_DIR/bert_model.ckpt $BERT_BASE_DIR/bert_config.json $BERT_BASE_DIR/pytorch_model.bin
- cp -f $BERT_BASE_DIR/pytorch_model.bin bert_pytorch/pybert/pretrain/bert/base-uncased/pytorch_model.bin
- cp -f $BERT_BASE_DIR/vocab.txt bert_pytorch/pybert/pretrain/bert/base-uncased/bert_vocab.txt
- cp -f $BERT_BASE_DIR/bert_config.json bert_pytorch/pybert/pretrain/bert/base-uncased/config.json