Error In Evaluation

Question

Error In Evaluation

mhbashari opened this issue 8 years ago · comments

Hi!

Running on my data cause this error (After first epoch):

Traceback (most recent call last):
  File "./train.py", line 220, in <module>
    dev_data, id_to_tag, dico_tags)
  File "/home/tagger/utils.py", line 282, in evaluate
    return float(eval_lines[1].strip().split()[-1])
IndexError: list index out of range

the form of data is:

<sent0_unicode_word><space><iob_tag>
<sent0_unicode_word><space><iob_tag>
<sent0_unicode_word><space><iob_tag>

<sent1_unicode_word><space><iob_tag>
<sent1_unicode_word><space><iob_tag>
<sent1_unicode_word><space><iob_tag>

The iob tags are in the set {B-PER, I-PER}, and the data is validated by this script:

for line in conll:
    if line != "\n":
        spl = line.strip().split()
        if spl[-1] not in ["B-PER", "I-PER", "O"]:
            return False

Would you help me to find out where and why my work raised this exception?

Guillaume Lample · Answer 1 · Thu Jun 16 2016 09:33:05 GMT+0800 (China Standard Time)

Hi,

If the algorithm runs after one epoch, your data should be in the good format. The error is coming from this line:
float(eval_lines[1].strip().split()[-1])

But the eva_lines are the lines extracted from the output of the evaluation script. The python code calls the external perl script to evaluate the sentences, and store the result into a file. Maybe this file has not been created properly. Can you check if you have something in the evaluation folder of your experiment? This kind of issues can happen when the python file is trying to write something into a folder where it doesn't have permissions.

Minh-Son Cao · Answer 2 · Fri Mar 03 2017 14:43:39 GMT+0800 (China Standard Time)

Hi glample,
I'm having quite the same error too but right in the first epoch. And I cannot fix it.
Traceback (most recent call last): File "./train.py", line 222, in <module> test_data, id_to_tag, dico_tags) File "/home/vuong/Documents/thang/tagger/utils.py", line 282, in evaluate return float(eval_lines[1].strip().split()[-1]) IndexError: list index out of range
And I also have
WARNING (theano.tensor.blas): We did not found a dynamic library into the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.
unexpected number of features: 5 (3)
Can you please tell me how to fix this error?
Thank you!

Le Duc Thang · Answer 3 · Wed Mar 08 2017 22:11:36 GMT+0800 (China Standard Time)

Hi Glample,
It is okay with your provided data but when I run it with my data, it has the same error as detuvoldo. Can you help us please?
Thank you.

Rabia-Noureen · Answer 4 · Fri Oct 06 2017 05:33:07 GMT+0800 (China Standard Time)

Hi @svensy @detuvoldo were you able to solve that error? Can you please guide me?I am stuck....

Minh-Son Cao · Answer 5 · Fri Oct 06 2017 22:27:33 GMT+0800 (China Standard Time)

@Rabia-Noureen can you post your data format here? some examples may be good for us to help you

Rabia-Noureen · Answer 6 · Fri Oct 06 2017 22:59:35 GMT+0800 (China Standard Time)

@detuvoldo thanks for your response, I am using the dataset that is provided by @glample link is down below:
https://github.com/glample/tagger/tree/master/dataset

I am using Windows 10 64 bit with python 2.7. When i tried to train the model i got an error:

(env_name27) C:\Users\Acer\tagger-master>python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GT 620M (CNMeM is enabled with initial size: 85.0% of memory, cuDNN not available)
Model location: ./models
Found 23624 unique words (203621 in total)
Found 84 unique characters
Found 17 unique named entity tags
14041 / 3250 / 3453 sentences in train / dev / test.
Saving the mappings to disk...
Compiling...
Starting epoch 0...
50, cost average: 15.406189
100, cost average: 11.704297
150, cost average: 10.767459
200, cost average: 13.812738
250, cost average: 11.460194
300, cost average: 13.207466
350, cost average: 12.146099
400, cost average: 12.428576
450, cost average: 10.977689
500, cost average: 12.830771
550, cost average: 10.062991
600, cost average: 9.834551
650, cost average: 11.481623
700, cost average: 9.460655
750, cost average: 9.907359
800, cost average: 10.251657
850, cost average: 10.405848
900, cost average: 14.113665
950, cost average: 10.436158
'.' is not recognized as an internal or external command,
operable program or batch file.
ID NE Total O S-LOC B-PER E-PER S-ORG S-MISC B-ORG E-ORG S-PER I-ORG B-LOC E-LOC B-MISC E-MISC I-MISC I-PER I-LOC Percent
0 O 42759 42759 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100.000
1 S-LOC 1603 1603 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
2 B-PER 1234 1234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
3 E-PER 1234 1234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
4 S-ORG 891 891 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
5 S-MISC 665 665 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
6 B-ORG 450 450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
7 E-ORG 450 450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
8 S-PER 608 608 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
9 I-ORG 301 301 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
10 B-LOC 234 234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
11 E-LOC 234 234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
12 B-MISC 257 257 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
13 E-MISC 257 257 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
14 I-MISC 89 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
15 I-PER 73 73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
16 I-LOC 23 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
42759/51362 (83.25026%)
Traceback (most recent call last):
File "train.py", line 220, in
dev_data, id_to_tag, dico_tags)
File "C:\Users\Acer\tagger-master\utils.py", line 282, in evaluate
return float(eval_lines[1].strip().split()[-1])
IndexError: list index out of range

I am doing something wrong?I am stuck with this issue for about 2 months and couldn't resolve it. Thanks in advance.

Minh-Son Cao · Answer 7 · Fri Oct 06 2017 23:08:47 GMT+0800 (China Standard Time)

according to the text you provided, I think that there are errors in the data set. Your should check the data set carefully. May be there is an unnecessary "." appeared anywhere

additionally, Theano will not be supported anymore, so you should change to another one.

Rabia-Noureen · Answer 8 · Fri Oct 06 2017 23:13:18 GMT+0800 (China Standard Time)

@detuvoldo so should i try to run the script on cpu instead of gpu? Because Theano is the requirement for running NER Tagger as mentioned in the .readme file. Moreover can you please provide the link for any other dataset that is according to the required format? I am new to python so i dont have much idea.
Thanks

Minh-Son Cao · Answer 9 · Fri Oct 06 2017 23:16:06 GMT+0800 (China Standard Time)

https://github.com/detuvoldo/tagger/tree/master/lstm/fold1

you can watch here to find the correct format

Rabia-Noureen · Answer 10 · Fri Oct 06 2017 23:20:30 GMT+0800 (China Standard Time)

Okay thanks alot i will use your provided dataset, hope it will solve the issue. I also want to train the model using GoogleNews word embeddings? Using the script

python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb --lr_method=adam --tag_scheme=iob --pre_emb=GoogleNews-vectors-negative300.bin --all_emb=300

Its a .bin file is it fine?

Minh-Son Cao · Answer 11 · Fri Oct 06 2017 23:21:45 GMT+0800 (China Standard Time)

i think that you dont need "=", just a space

Rabia-Noureen · Answer 12 · Fri Oct 06 2017 23:32:28 GMT+0800 (China Standard Time)

Oh i got it thanks for your help sir....

Rabia-Noureen · Answer 13 · Mon Oct 09 2017 00:27:51 GMT+0800 (China Standard Time)

@detuvoldo sorry for disturbing you again i tried to run your dataset and script, it solved the error in the dataset but the error is still there. I replaced the 2 lines in utils.py because i am using Windows 10 and there were some path related issues.

The error is
run train.py --train lstm/fold1/train --dev lstm/fold1/dev --test lstm/fold1/test
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device Model location: Found 2573 unique Found 64 unique characters
Found 27 unique named entity tags
858 / 289 / 286 Saving the mappings to disk...
Compiling...
Starting epoch 0...
50, cost average: 101.645935
100, cost average: 83.234520
150, cost average: 82.757523
200, cost average: 69.019493
250, cost average: 64.411346
300, cost average: 62.836563
350, cost average: 60.969635
400, cost average: 58.851826
450, cost average: 49.994457
ID NE Total 0 O 9314 9175 1 I-LOC 2604 2602 2 B-CTT 478 245 3 B-OBJ 464 282 4 B-LOC 439 439 5 B-ACR 346 334 6 B-INT 339 126 7 B-PRC 233 232 8 I-FACE 218 218 9 I-PRC 232 225 10 I-ACR 214 203 11 I-OBJ 201 198 12 B-FNUM 170 156 13 I-FNUM 166 157 14 I-DDIR 170 169 15 B-FACE 120 120 16I-BEDNUM 103 103 17 I-CTT 103 98 18 B-DDIR 83 83 19 I-INT 57 56 20B-BEDNUM 57 57 21B-BATHNUM 44 44 22I-BATHNUM 45 44 23 I-FPOS 42 42 24 B-FPOS 37 36 25 I-BDIR 22 22 26 B-BDIR 6 9780/16307 (59.97424%)
Traceback (most recent call last): 0: GeForce GT 620M (CNMeM is enabled with initial size: 85.0% of memory, cuDNN not available)
\?\E:\New-Code\tagger-master\tagger-master\models\tag_scheme=iob,lower=False,zeros=False,char_dim=25,char_lstm_dim=25,char_bidirect=True,word_dim=100,word_lstm_dim=100,word_bidirect=True,pre_emb=,all_emb=False,cap_dim=0,crf=True,dropout=0.3,lr_method=sgd-lr_.005
words (48986 in total)
sentences in train / dev / test.
O I-LOC B-CTT B-OBJ B-LOC B-ACR B-INT B-PRC I-FACE I-PRC I-ACR I-OBJ B-FNUM I-FNUM I-DDIR B-FACEI-BEDNUM I-CTT B-DDIR I-INTB-BEDNUMB-BATHNUMI-BATHNUM I-FPOS B-FPOS I-BDIR B-BDIR Percent
0 63 14 0 0 62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 98.508
0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 233 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48.745
0 0 177 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 38.147
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 1 1 0 7 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.023
0 0 32 0 0 181 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 53.392
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 2 0 1 0 0 0 0 7 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.271
0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 5 0 0 8 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000

File "E:\New-Code\tagger-master\tagger-master\train.py", line 221, in
dev_data, id_to_tag, dico_tags, epoch)

File "utils.py", line 284, in evaluate
return float(eval_lines[1].strip().split()[-1])

IndexError: list index out of range

Can you please suggest something that can help me solve the error?
Thanks in advance

Rabia-Noureen · Answer 14 · Wed Oct 11 2017 03:38:53 GMT+0800 (China Standard Time)

@detuvoldo sorry to disturb you again but still waiting for your response. Please reply if you can help me out in this regard. Thanks