insectatorious / hn_kaggle

Simple clustering of HN posts from this Kaggle dataset: https://www.kaggle.com/hacker-news/hacker-news-posts

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clustering Hacker News post titles

A simple method of clustering and viewing Hacker News posts.

Data obtained from https://www.kaggle.com/hacker-news/hacker-news-posts.

Example plot

This screenshot shows the first 1000 titles clustered. Clustered HN Post Titles

For an interactive plot, see it directly on Plotly.

Requirements

Pip install

pip install cython numpy pandas scikit-learn gensim plotly

Alternately, see the requirements.txt file.

About

Simple clustering of HN posts from this Kaggle dataset: https://www.kaggle.com/hacker-news/hacker-news-posts

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 100.0%