songyouwei / ABSA-PyTorch

Aspect Based Sentiment Analysis, PyTorch Implementations. 基于方面的情感分析,使用PyTorch实现。

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

请教原因:训练外国语模型时出错(ValueError: index can't contain negative values)

alian921 opened this issue · comments

您好
用您默认的训练集训练完全没有问题。
目前正在试验对英语以外的语言的适用性。正在试验日文。

①我准备了日文推特训练集,并且修改dependency_graph.py里的spacy语言模型,让其对应日文的依赖树以便生成正确的1,0 Array。
#nlp = spacy.load('en_core_web_sm')
nlp = spacy.load('ja_core_news_sm')
顺利生成了graph文件。如下。
$ du -sh datasets/my-twitter/*
16K datasets/my-twitter/test.raw
340K datasets/my-twitter/test.raw.graph
32K datasets/my-twitter/train.raw
1.1M datasets/my-twitter/train.raw.graph

②然后修改train.py,将pretrained_bert模型也置换成日语bert模型。如下。
parser.add_argument('--pretrained_bert_name', default='cl-tohoku/bert-base-japanese-whole-word-masking', type=str)

③最后训练的时候出现了下面的错误,可以指点一下原因吗?
$ python train.py --model_name bert_spc --dataset my-twitter
Traceback (most recent call last):
File "train.py", line 307, in
main()
File "train.py", line 302, in main
ins = Instructor(opt)
File "train.py", line 52, in init
self.trainset = ABSADataset(opt.dataset_file['train'], tokenizer)
File "/usr/local/src/ABSA-PyTorch/data_utils.py", line 161, in init
dependency_graph = np.pad(idx2graph[i],
File "<array_function internals>", line 5, in pad
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/arraypad.py", line 743, in pad
pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/arraypad.py", line 514, in _as_pairs
raise ValueError("index can't contain negative values")
ValueError: index can't contain negative values

我把max_seq_len的值从默认的85延长到200后,问题解决了。应该是自己准备的文本太长的缘故。