ValueError: you must specify a corpus reader

Question

ValueError: you must specify a corpus reader

sbrugman opened this issue 6 years ago · comments

The following command works perfectly fine:
python train_chunker.py conll2002 --filename ~/nltk_data/chunkers/conll2002_chunker.pickle --classifier NaiveBayes

Then I copy ~/nltk_data/conll2002/ to ~/ntlk_data/conlltest/ and run the command:
python train_chunker.py conlltest --filename ~/nltk_data/chunkers/conlltest_chunker.pickle --classifier NaiveBayes

The output is:

loading conlltest
Traceback (most recent call last):
  File "train_chunker.py", line 80, in <module>
    chunked_corpus = load_corpus_reader(args.corpus, reader=args.reader, fileids=args.fileids)
  File "/mnt/3E6227E362279F21/scriptie/external/nltk-trainer/nltk_trainer/__init__.py", line 64, in load_corpus_reader
    raise ValueError('you must specify a corpus reader')
ValueError: you must specify a corpus reader

What am I missing? My version of nltk is 3.2.5.

Jacob · Answer 1 · Fri Mar 23 2018 00:44:57 GMT+0800 (China Standard Time)

Take a look at https://nltk-trainer.readthedocs.io/en/latest/train_chunker.html and the examples for the --reader option. If you're using your own corpus, the script needs to know how to open & read the data.