motacapla / natural-language-preprocessings

Some recipes of natural language pre-processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Natural Language Pre-processing

This repository includes some recipes of natural language pre-processing.

The list of recipes are as follows:

  • Data cleaner
  • Word normalization
  • Stopwords remover
  • Tokenizer
  • Word Vector

Install

To install required modules, simply:

$ pip install -r requirements.txt

Setup

First, you should download livedoor news corpus and extract it. For downloading the corpus, please execute following command:

$ cd src/data
$ python make_dataset.py

Now, you can ready for classification!

Start jupyter notebook:

$ jupyter notebook

And you can execute notebooks/document_classification.ipynb.

Good NLP Life!

Licence

MIT

Author

Hironsan

About

Some recipes of natural language pre-processing

License:MIT License


Languages

Language:Python 61.7%Language:Jupyter Notebook 34.4%Language:HTML 3.9%