jmowen / nlp-for-the-easily-bored

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP Information Extraction for the easily bored

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

Please fork this repository and follow along.

If you fork this repo and changes are made to this repository after that, you'll want to sync your fork.

If you clone your forked repo locally, here's how to keep your forked clone up-to-date:

git remote add upstream https://github.com/myedibleenso/nlp-for-the-easily-bored
# check for updates in myedibleenso/nlp...bored
git fetch upstream  
# checkout your own local master branch
git checkout master
# pull in latest changes from myedibleenso/nlp...bored to your local master
git merge upstream/master

NOTE: this is a work in progress. Check back later for updates...

Table of Contents

NOTE: When viewing the slides, it's easiest to advance using fn+ Down Arrow

  1. NLP Information Extraction for the easily bored
  • slides / notebook
  • How do we get useful things out of a sea of text?
  • Learn about finding people, places, organizations, etc.
  1. Introduction to py-processors
  • slides / notebook
  • An overview of the library for natural language processing (NLP) library we'll be using in the examples

Examples

Here you'll find a few use cases illustrating the concepts covered in the intros.

  1. Who, what, when, and where? Making sense of web-based news
  1. Getting structured information out of Wikipedia pages
  • slides / notebook
  • You now know a little about how to find named entities (people, places, organizations, etc.) in text, but how do these interact in text?
  • Challenge: Try to populate a Wikipedia infobox for Barack Obama.
  1. Movie reviews
  • slides / notebook
  • Is it a positive or negative review? If we don't have a score, can we identity the sentiment and assign a score based on the review text?
  • NOTE: To really get into this example, you'll need a rotten tomatoes developer key
  • Challenge: Predict critics consensus scores based only on the review text
    • Use whatever method you want
      • feature-based classifier, latent feature model, etc.
    • What works and why?

Installation

There a couple of things you'll need to run the notebooks in this repository...

Requirements

  • Java 8
  • 2 or 3GB of RAM available for running the NLP server

Python dependencies via conda

conda create -n bored python=3
source activate bored
# assuming you're in the "nlp-for-the-easily-bored" directory
pip install -r requirements.txt

Running the notebooks

The notebooks are all under /notebooks

If you want to run/alter them locally after installing the project dependencies, simply run this command:

jupyter notebook

Resources

See resources.md for links to NLP datasets, free courses, etc.

Questions

Have a question? See the FAQ. It may have already been asked/answered.

Contributing

Thanks for the help! Take a look at contributing.md

About

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

License:MIT License


Languages

Language:Jupyter Notebook 100.0%