crncck / WebsiteClassification

Website classification using text processing techniques on DMOZ dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebsiteClassification

To build a website classification model, we used the DMOZ dataset which is a large communally maintained open directory that categorizes web content. DMOZ closed in 2017 because AOL no longer wished to support the project. We found the split version of the dataset in a Github repository.

Dataset


Ayşe Ceren Çiçek

  • Text Preprocessing with spaCy
  • Logistic Regression Model
  • Decision Tree Model
  • Multinomial Naive Bayes Model

Gizem Kurnaz

  • Text Preprocessing with NLTK
  • Cross-Validation
  • Multinomial Naive Bayes Classifier Model
  • Logistic Regression Model
  • Decision Tree Classifier Model
  • Random Forest Classifier Model

About

Website classification using text processing techniques on DMOZ dataset


Languages

Language:Jupyter Notebook 100.0%