This repo includes the notebooks, source data, and other materials for: Get Started with Natural Language Processing in Python.
It's a good idea to use virtualenv to manage your Python 3 virtual environment:
virtualenv -p /usr/bin/python3 ~/venv
Then run:
source ~/venv/bin/activate
To install the required Python libraries and related data sets:
pip install -r requirements.txt
python -m nltk.downloader punkt
python -m nltk.downloader wordnet
python -m textblob.download_corpora
python -m spacy download en
The GitHub page for textblob-aptagger
says the package is no longer needed as of TextBlob 0.11.0,
which uses the NLTK perceptron tagger instead. Source code for nltk.tag.perceptron
claims that
it's a port of the TextBlob code. Not exactly -- however there may be issues on some versions of
Windows.
If you run pip install textblob
and DO NOT install textblob-aptagger
that should work fine
with only minor changes to Exercise 3 code and the pynlp.py
module:
- change
import textblob_aptagger as tag
toimport nltk
- change
tag.PerceptronTagger()
tonltk.tag.PerceptronTagger()
- change
.tag(sent)
to.tag(nltk.word_tokenize(sent))
(or the equivalent)
Results should look very much like the original results, although in general NTLK perceptron tagger has problems, e.g., it doesn't handle punctuation properly.
- kudos @blue_slacker
NB: these course materials will shift from TextBlob to spaCy, soon, although the latter still has a few rough edges
A Docker container -- courtesy of @montyz, @ashapochka -- was defined for an instance of this course a few months ago. May need updates?