s2t2 / learning-nlp-py

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning NLP

Setup

Fork this repo and clone your forked copy onto your local machine, then navigate there from the command-line:

cd learning-nlp-py/

Create and/or activate a Python 3.7 virtual environment:

conda create -n learning-nlp-env python=3.7 # (first time only)
conda activate learning-nlp-env

Install package dependencies:

pip install -r requirements.txt # (first time only)

Download the data:

  • Mod 1: Download the "amazon_reviews.csv" file and move it into the "data" directory of this repository.
  • Mod 2: Download the "bbc_docs" directory of text files, and move it into the "data" directory of this repository.
  • Mod 3: Download the data from this Kaggle Competition, and move it into the "data/whiskey" directory of this repository. (FYI: ALREADY INCLUDED IN THIS REPO)
  • Mod 4: Text snippets from novels already included in the "data/novels" directory of this repository. (FYI: ALREADY INCLUDED IN THIS REPO)

Download the spacy language models:

python -m spacy download en_core_web_md
python -m spacy download en_core_web_lg

Download NLTK data, like stopwords:

python

> import nltk
> nltk.download()
> nltk.download("stopwords")
> nltk.download("movie_reviews")

Usage

Run some example code:

# MOD 1:
python -m app.tokenizer

# MOD 2:
python -m app.vectorizer
python -m app.word_distances

# MOD 3:
python -m app.grid_searcher
python -m app.amzn_reviews_classifier
python -m app.imdb_reviews_classifier
python -m app.whiskey_reviews_classifier

# MOD 4:
python -m app.novels

Start working from scratch in your own clean space:

python -m app.playground # MOD 1
python -m app.playground2 # MOD 2
python -m app.playground3 # MOD 3
python -m app.playground4 # MOD 4

Testing

pip install pytest # (first time only)
pytest
# pytest --disable-pytest-warnings -s
# pytest test/parser_test.py --disable-pytest-warnings -s
# pytest test/parser_test.py --disable-pytest-warnings -s -k 'test_tokenize'

About


Languages

Language:Python 54.7%Language:Jupyter Notebook 45.3%