This IPython notebook will show you how to cluster and visualize a set of documents, articles, or texts as in this demo. The included example clusters a set of Wikipedia articles, which is this list of political ideologies.
The notebook derives a clustering by first converting a set of documents into a tf-idf matrix which is a representation of each document as a vector in which each element represents the relative importance of a unique term to that document. Using that representation, we can reduce its dimension to 2 using t-SNE, and then save it to a json file.
The folder visualize
contains a p5.js sketch which displays the results in a browser after adjusting the t-SNE coordinates slightly so as to avoid overlaps/collisions of words. More info in the scripts.