oroszgy / hungarian-text-mining-workshop

Materials for the Text Mining workshop held in the HuNLP meetup, June 2017

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text mining workshop

Preparation for the workshop

Please be prepared with

  • basic knowledge of Python
  • experience in using Jupyter notebooks

During the course we will use little bit of Pandas (10 minute intro) and scikit-learn to build simple machine learning models.

Install dependencies and run the notebooks

The easy way: using Docker

Get the docker image: docker pull oroszgy/hungarian-text-mining-workshop

Start Jupyter Notebook: make start

The hard way: installing the packages manually

  1. Make sure you have Python 3.5+ installed (preferably a conda distribution)
  2. Clone this repository: git clone http://github.com/oroszgy/hungarian-text-mining-workshop && cd hungarian-text-mining-workshop
  3. Install the necessary packages: pip install -r requirements.txt
  4. Download the Enlgish and the Hungaruan NLP models for spaCy:
    • python -m spacy download en
    • pip install https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_tagger_web_md-0.1.0/hu_tagger_web_md-0.1.0.tar.gz
  5. Install HuNlpy
    • pip install https://github.com/oroszgy/hunlp/releases/download/0.2/hunlp-0.2.0.tar.gz

Start Jupyter Notebook: jupyter notebook

Table of Contents

  1. Practical NLP in Python: spaCy and textacy, Describing documents with words
  2. Document categorization, Sentiment analysis
  3. Extracting named entities and concepts

Softwares used


(c) Gyorgy Orosz, 2017

About

Materials for the Text Mining workshop held in the HuNLP meetup, June 2017

License:MIT License


Languages

Language:Jupyter Notebook 99.8%Language:Makefile 0.2%Language:Shell 0.1%