NLTK downloader problem when deploying with docker.

Question

NLTK downloader problem when deploying with docker.

issmirnov opened this issue 3 years ago · comments

I have deployed the docker image to my personal server. After importing the RSS from my blog (https://ivans.io/rss/) as an OPML file, I click on "start labelling". This causes a stack trace:

LookupError: ********************************************************************** 
Resource �[93mpunkt�[0m not found. Please use the NLTK Downloader to obtain the resource: �
[31m>>> import nltk >>> nltk.download('punkt') �
[0m For more information see: https://www.nltk.org/data.html Attempted to load �[93mtokenizers/punkt/PY3/english.pickle�
[0m Searched in: - '/root/nltk_data' - '/usr/local/nltk_data' - '/usr/local/share/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '' **********************************************************************
Traceback:
File "/usr/local/lib/python3.8/site-packages/streamlit/script_runner.py", line 354, in _run_script
    exec(code, module.__dict__)
File "/app/main.py", line 30, in <module>
    cart_section(col2)
File "/app/components.py", line 110, in cart_section
    content_paragraphs = get_paragraphs(row['text'])
File "/app/processing.py", line 19, in get_paragraphs
    sents = sent_tokenize(line)
File "/usr/local/lib/python3.8/site-packages/nltk/tokenize/__init__.py", line 107, in sent_tokenize
    tokenizer = load("tokenizers/punkt/{0}.pickle".format(language))
File "/usr/local/lib/python3.8/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
File "/usr/local/lib/python3.8/site-packages/nltk/data.py", line 875, in _open
    return find(path_, path + [""]).open()
File "/usr/local/lib/python3.8/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)

Paul Bricman · Answer 1 · Tue Dec 07 2021 14:20:55 GMT+0800 (China Standard Time)

Oops, the Dockerfile missed the download step of the punkt data. I just pushed a new version to docker hub, could you try again?

Also, I saw you just set up your conceptarium, make sure to have a handful of thoughts saved before using the lexiscore, so that the results make sense. They'll improve as you have more things saved.

Ivan Smirnov · Answer 2 · Tue Dec 07 2021 17:44:56 GMT+0800 (China Standard Time)

Thanks! That issue is resolved. I notice that despite providing an RSS for my full blog, it only pulls the most recent entry from the OPML file. Would you like me to open a separate issue on that?

Paul Bricman · Answer 3 · Wed Dec 08 2021 12:41:27 GMT+0800 (China Standard Time)

A new issue would be nice, let's continue there!