stocyr / wiki-tSNE

IPython notebook for clustering and visualizing documents using tf-idf analysis and t-SNE, example of Wikipedia articles

Home Page:http://www.genekogan.com/works/wiki-tSNE

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wiki-tSNE

This IPython notebook will show you how to cluster and visualize a set of documents, articles, or texts as in this demo. The included example clusters articles visited in the WikiGame. To do that, you must copy the table LAST GAME RESULTS on the left of the website and save it as text in snapshot.txt. Then in the python file, enter the lines of the users whose link path you want to display.

The notebook derives a clustering by first converting a set of documents into a tf-idf matrix which is a representation of each document as a vector in which each element represents the relative importance of a unique term to that document. Using that representation, we can reduce its dimension to 2 using t-SNE, and then save it to a json file, along with the order of all the link paths.

The folder visualize contains a p5.js sketch which displays the results in a browser.

About

IPython notebook for clustering and visualizing documents using tf-idf analysis and t-SNE, example of Wikipedia articles

http://www.genekogan.com/works/wiki-tSNE


Languages

Language:JavaScript 98.9%Language:Python 1.1%Language:HTML 0.0%