ValueError: max() arg is an empty sequence

Question

ValueError: max() arg is an empty sequence

victoriastuart opened this issue 7 years ago · comments

Two issues:

Others (e.g. issues #20 , #41 ) asked what a 'tokenized sentence' is; that puzzled me too.
Answer: any sentence is 'tokenized'; e.g.

Victoria was born in 1961 in Halifax, Nova Scotia, Canada.

If your input file contains blank lines, e.g.

Victoria was born in 1961 in Halifax, Nova Scotia, Canada.

Victoria used to work at NIEHS in North Carolina.

then tagger.py | utils.py throws an error:

...
    max_length = max([len(word) for word in words])
ValueError: max() arg is an empty sequence

You can solve that, simply, by changing the following lines in tagger.py

Original:

print 'Tagging...'
with codecs.open(opts.input, 'r', 'utf-8') as f_input:
    count = 0
    for line in f_input:
        words = line.rstrip().split()

Modified:

print 'Tagging...'
with codecs.open(opts.input, 'r', 'utf-8') as f_input:
    count = 0
    for line in f_input:
        if len(line) <= 1:
            line = ''
        words = line.rstrip().split()

Added lines:

        if len(line) <= 1:
            line = ''

Nikolai Kruglikov · Answer 1 · Fri Aug 04 2017 14:39:32 GMT+0800 (China Standard Time)

@victoriastuart Thanks a lot, you just saved me a lot of time!

Rabia-Noureen · Answer 2 · Wed Oct 04 2017 18:49:36 GMT+0800 (China Standard Time)

Hi @victoriastuart @nkruglikov I am new to python can you please help me out with training the model using GoogleNews word embeddings? I am trying to train using the script

python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb --lr_method=adam --tag_scheme=iob --pre_emb=GoogleNews-vectors-negative300.bin --all_emb=300

I got this error:

I am stuck with this issue for about 2 months and couldn't resolve it. Thanks in advance.