Jun-Tam / article_analysis_word2vec

NLP Article Trend Analysis (Oil & Gas)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Article trend Analysis by word2vec

Summary

This Jupyter Notebook demonstrates application example of NLP to energy-industry articles in PDF.

Part1: Preprocessing

Preprocessing part is described: conversion from PDF to text, tokenizer, duplicate file deletion.
About 600 articles were collected and converted into text files.
https://github.com/Jun-Tam/article_analysis_word2vec/blob/master/NLP_Articles_Preprocess.ipynb

Part2: Trend Analysis

Analysis part is described: BoW, IDF/TF, word2vec, WordCloud
https://github.com/Jun-Tam/article_analysis_word2vec/blob/master/NLP_Articles_Word2Vec.ipynb

Word count from all the articles is as shown below.

demo

Word Cloud is a usefull tool to visualize what were people's interests in each year.

demo

Using word2vec "ness" vectors are defined, and each article is converted into ness vectors.
The time-series plots below show recent article trends for individual ness vectors along with oil price history.

demo

Reference

Natural Language Processing In Action, Undestanding, analyzing, and generating text with Python, Manning
Hobson Lane, Cole Howard, Hannes Max Hapke

About

NLP Article Trend Analysis (Oil & Gas)


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%