lonePatient / Bert-Multi-Label-Text-Classification

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to use pe-train model which is provided by on google bert repository?

tripbnb66 opened this issue · comments

On the https://github.com/google-research/bert#pre-trained-models page, people can download
BERT-Base, Multilingual Cased (New, recommended): 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
After extracting data from the zip file, there are 5 files in the zip files.
I expected to use the pre-train model which is provided by google for Multilingual training.

But the file names are totally different to files which is provided in this repositry

bert_config.json
bert_model.ckpt.data-00000-of-00001
bert_model.ckpt.index
bert_model.ckpt.meta
vocab.txt

In this repository, there are 3 files and they are

bert-base-chinese-pytorch_model.bin
bert-base-chinese-vocab.txt
bert-base-chinese-config.json

I tried to use bert_model.ckpt.data-00000-of-00001 to replace the bert-base-chinese-pytorch_model.bin but it doesn't work and I got an error message:
_pickle.UnpicklingError: invalid load key, '\x27'.

The version of this repository that I used is "14caa98" and I used "git reset --hard 14caa98" to make sure I used the right version.

check https://modelzoo.co/model/pytorch-pretrained-bert and you will find the answer.
complete steps are listed below:

  1. wget "https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip"
  2. unzip multi_cased_L-12_H-768_A-12.zip
  3. pip3 install pytorch-pretrained-bert
  4. export BERT_BASE_DIR=/home/david/bot/cron/bert/multi_cased_L-12_H-768_A-12
  5. pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch $BERT_BASE_DIR/bert_model.ckpt $BERT_BASE_DIR/bert_config.json $BERT_BASE_DIR/pytorch_model.bin
  6. cp -f $BERT_BASE_DIR/pytorch_model.bin bert_pytorch/pybert/pretrain/bert/base-uncased/pytorch_model.bin
  7. cp -f $BERT_BASE_DIR/vocab.txt bert_pytorch/pybert/pretrain/bert/base-uncased/bert_vocab.txt
  8. cp -f $BERT_BASE_DIR/bert_config.json bert_pytorch/pybert/pretrain/bert/base-uncased/config.json