genekogan / wiki-tSNE

IPython notebook for clustering and visualizing documents using tf-idf analysis and t-SNE, example of Wikipedia articles

Home Page:http://www.genekogan.com/works/wiki-tSNE

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wiki-tSNE

This IPython notebook will show you how to cluster and visualize a set of documents, articles, or texts as in this demo. The included example clusters a set of Wikipedia articles, which is this list of political ideologies.

The notebook derives a clustering by first converting a set of documents into a tf-idf matrix which is a representation of each document as a vector in which each element represents the relative importance of a unique term to that document. Using that representation, we can reduce its dimension to 2 using t-SNE, and then save it to a json file.

The folder visualize contains a p5.js sketch which displays the results in a browser after adjusting the t-SNE coordinates slightly so as to avoid overlaps/collisions of words. More info in the scripts.

About

IPython notebook for clustering and visualizing documents using tf-idf analysis and t-SNE, example of Wikipedia articles

http://www.genekogan.com/works/wiki-tSNE


Languages

Language:HTML 80.6%Language:Jupyter Notebook 19.4%