frieds / hacker_news_nlp

Natural Language Processing on Latest 250K Hacker News Comments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repo contains a natural language processing project completed on the latest 250K Hacker News posts.

  • hn_analysis.ipynb is data analysis using Python, Pandas, Numpy and Matplotlib; cleaned data, performed text pre-processing, used LDA and KMeans clustering methods, and ultimately foudn a word2vec model revelead the best context for the data
  • hn_api_crawler.py is code to crawl the Hacker News API and dump the data into a MongoDB database

About

Natural Language Processing on Latest 250K Hacker News Comments


Languages

Language:Jupyter Notebook 99.7%Language:Python 0.3%