nikitakit / self-attentive-parser

High-accuracy NLP parser with models for 11 languages.

Home Page:https://parser.kitaev.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using huggingface/transformers as BERT pre-trained language model ?

HiiamCong opened this issue · comments

Dear authors,

I currently want to use (PhoBERT: Pre-trained language models for Vietnamese) as a BERT pre-trained language model for this repo. PhoBERT was built and deploy in huggingface/transformers library (huggingface/transformers).
As I know, self-attentive-parser is using pytorch_pretrained_bert for getting BERT Model. I have tried to change the code of function get_bert in parse_nk.py to use PhoBERT:

def get_bert(bert_model, bert_do_lower_case):
    from transformers import AutoModel, AutoTokenizer
    phobert = AutoModel.from_pretrained(bert_model)
    tokenizer = AutoTokenizer.from_pretrained(bert_model, do_lower_case=bert_do_lower_case)
    return tokenizer, phobert

But get this error:

Traceback (most recent call last):
  File "src/main.py", line 612, in <module>
    main()
  File "src/main.py", line 608, in main
    args.callback(args)
  File "src/main.py", line 564, in <lambda>
    subparser.set_defaults(callback=lambda args: run_train(args, hparams))
  File "src/main.py", line 312, in run_train
    _, loss = parser.parse_batch(subbatch_sentences, subbatch_trees)
  File "/home/kynh/codes/self-attentive-parser/src/parse_nk.py", line 1026, in parse_batch
    features_packed = features.masked_select(all_word_end_mask.to(torch.bool).unsqueeze(-1)).reshape(-1, features.shape[-1])
AttributeError: 'str' object has no attribute 'masked_select'

I read a paper using PhoBert for training in this repo so I am pretty sure this can be done, but do not know how to do it.
Any solution? thanks!