lonePatient / Bert-Multi-Label-Text-Classification

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to support Multilingual?

tripbnb66 opened this issue · comments

Currently, this model only support Chinese only.
Is it possible to support Multilingual?
If it can support Multilingual, can anyone tell me how to do?
(If you can provide detail steps to support Multilingual pre-train model, I will much apprecate)

Thank you. Best regards

I found the solution and shared to people who need have the problem.

  1. download pre-train files from bert repository https://github.com/google-research/bert
  2. unzip the zip file
  3. install pytorch-pretrained-bert by pip or pip3
  4. using command "pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch ..." to convert

A sample is listed below:

  1. wget "https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip"
  2. unzip multi_cased_L-12_H-768_A-12.zip
  3. pip3 install pytorch-pretrained-bert
  4. export BERT_BASE_DIR=/home/david/bot/cron/bert/multi_cased_L-12_H-768_A-12
  5. pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch $BERT_BASE_DIR/bert_model.ckpt $BERT_BASE_DIR/bert_config.json $BERT_BASE_DIR/pytorch_model.bin
  6. cp -f $BERT_BASE_DIR/pytorch_model.bin bert_pytorch/pybert/pretrain/bert/base-uncased/pytorch_model.bin
  7. cp -f $BERT_BASE_DIR/vocab.txt bert_pytorch/pybert/pretrain/bert/base-uncased/bert_vocab.txt
  8. cp -f $BERT_BASE_DIR/bert_config.json bert_pytorch/pybert/pretrain/bert/base-uncased/config.json