hammadkhann / Kmeans-Clustering-Doc20.

The task is to cluster a given collection of documents into well-define, identifiable clusters. Clustering is an unsupervised and very challenging problem, here you need to identify many parameters for the task. You are now aware of two basic types of clustering algorithm partition and hierarchical, here you have a choice to apply any of them. The feature choices are still open for you. In order to evaluate the clustering results, you should apply one internal and one external clustering evaluation measure. Dataset The dataset is a subset of famous NEWS20 dataset. It contains 50 textual documents. In supervise learning the input is the only thing available for learning. You can set a baseline for this dataset by using tf*idf based features from the text.

Geek Repo

Github PK Tool

hammadkhann / Kmeans-Clustering-Doc20.

hammadkhann/Kmeans-Clustering-Doc20. Issues

No issues in this repository yet.