NLP Information Extraction for the easily bored

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

Please fork this repository and follow along.

If you fork this repo and changes are made to this repository after that, you'll want to sync your fork.

If you clone your forked repo locally, here's how to keep your forked clone up-to-date:

git remote add upstream https://github.com/myedibleenso/nlp-for-the-easily-bored
# check for updates in myedibleenso/nlp...bored
git fetch upstream  
# checkout your own local master branch
git checkout master
# pull in latest changes from myedibleenso/nlp...bored to your local master
git merge upstream/master

NOTE: this is a work in progress. Check back later for updates...

NOTE: When viewing the slides, it's easiest to advance using fn+ Down Arrow

~~NLP~~ Information Extraction for the easily bored

slides / notebook
How do we get useful things out of a sea of text?
Learn about finding people, places, organizations, etc.

Introduction to py-processors

slides / notebook
An overview of the library for natural language processing (NLP) library we'll be using in the examples

Examples

Here you'll find a few use cases illustrating the concepts covered in the intros.

Who, what, when, and where? Making sense of web-based news

slides / notebook
go from html -> people, places, etc.
Learn how to do basic IE on an article you may have read from The Guardian
Challenge: How do we disambiguate organizations and people?

Getting structured information out of Wikipedia pages

slides / notebook
You now know a little about how to find named entities (people, places, organizations, etc.) in text, but how do these interact in text?
Challenge: Try to populate a Wikipedia infobox for Barack Obama.

Movie reviews

slides / notebook
Is it a positive or negative review? If we don't have a score, can we identity the sentiment and assign a score based on the review text?
NOTE: To really get into this example, you'll need a rotten tomatoes developer key
Challenge: Predict critics consensus scores based only on the review text
- Use whatever method you want
  - feature-based classifier, latent feature model, etc.
- What works and why?

Installation

There a couple of things you'll need to run the notebooks in this repository...

Requirements

Java 8
2 or 3GB of RAM available for running the NLP server

Python dependencies via `conda`

conda create -n bored python=3
source activate bored
# assuming you're in the "nlp-for-the-easily-bored" directory
pip install -r requirements.txt

Running the notebooks

The notebooks are all under /notebooks

If you want to run/alter them locally after installing the project dependencies, simply run this command:

jupyter notebook

Resources

See resources.md for links to NLP datasets, free courses, etc.

Questions

Have a question? See the FAQ. It may have already been asked/answered.

Contributing

Thanks for the help! Take a look at contributing.md

jmowen / nlp-for-the-easily-bored

NLP Information Extraction for the easily bored

Table of Contents

Examples

Installation

Requirements

Python dependencies via `conda`

Running the notebooks

Resources

Questions

Contributing

About

Languages

NLP Information Extraction for the easily bored

Table of Contents

Examples

Installation

Requirements

Python dependencies via conda

Running the notebooks

Resources

Questions

Contributing

About

Languages

Python dependencies via `conda`