akashkthkr / SWM573_G18_P7_Document_Clustering_Summarization_and_Visualization

Explore different clustering algorithms, Implemented on the dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SWM573 - Document Clustering Summarization and Visualization

Explore different clustering algorithms, Implemented on the dataset

Project Abstract & Aim

The aim of this project is to:

  • Explore different clustering algorithms, Implemented on the dataset
  • Cluster the news articles
  • Recommend similar articles that are available similar to the documents
  • Extract keywords in the articles and provide a short summary
  • Apply Visualization techniques on the textual document to showcase relevancy
  • Identifying anomalies in the dataset
  • Identifying popular words in each group - like a word cloud

DataSet Used

Dataset used - 20 Newsgroups Dataset (http://qwone.com/~jason/20Newsgroups/) Using sklearn.datasets import fetch_20newsgroups

Run time and Instances

We used the google colab and Pycharm Jupyter notebook for additional CPU and GPU support and dataset storage

Algorithms and Visualization Techniques used

Clustering

  • LDA
  • HDBScan
  • Agglomerative clustering

Visualization Techniques

  • t-SNE
  • UMAP
  • Compression-VAE

Team Members

About

Explore different clustering algorithms, Implemented on the dataset

License:MIT License


Languages

Language:Jupyter Notebook 100.0%