abhi40308 / News-Documents-Clustering

News documents clustering using latent semantic analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

News-Documents-Clustering

News documents clustering using latent semantic analysis. Used LSA and K-means algorithms to cluster news documents and visualized the results using UMAP (Uniform Manifold Approximation and Projection).

Considering the frequency(tf-idf) of important words in the news documents, the news documents are clustered where the related documents are shown using the same color which can be seen in the screenshots in the end. The color is decided by using k-means(running k-means on data separately and giving integer values to each documents based on k-means similarity results) and the actual positioning of documents(each document is represented by a dot on the graph) is achieved by applying LSA, thus verifying the results obtained using k-means.

This code is part of medium blog post
This post was published in mc.ai
Link to google colab

Results on 10000 documents

result

About

News documents clustering using latent semantic analysis


Languages

Language:Jupyter Notebook 97.5%Language:Python 2.5%