rogerioacp/classify-text

Salam

Text Classification with python

Second Work Assignment for MAP-I KDD class.

The dataset used was the "Reuters Newsire" dataset.

When you run main.py it asks you for the root of the dataset. You can supply your own dataset assuming it has a similar directory structure.

Some of the supplied text files had incompatibility with utf-8!

Even textedit.app can't open those files. And they created problem in the code. So I'll delete them as part of the preprocessing.

The code is pretty straight forward and well documented.

python main.py

For experiments I used the subset of the dataset (as described above), coffe and interest on the folder reuters_test.

The report with the results is available in the repository on the pdf report.pdf

"Reuters newswire" text classification with python

Language:Python 100.0%