sarahm44 / crypto-sentiment-analysis

Analysis of the sentiment of the latest news articles on Bitcoin and Ethereum using sentiment analysis, natural language processing and named entity recognition.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crypto Sentiment Analysis

Table of Contents

Overview

In this repository I applied natural language processing to understand the sentiment in the latest news articles featuring Bitcoin and Ethereum. I also applied fundamental NLP techniques to better understand the other factors involved with the coin prices such as common words and phrases and organizations and entities mentioned in the articles.

I completed the following tasks:

  1. Sentiment Analysis
  2. Natural Language Processing
  3. Named Entity Recognition

See this contained in this Jupyter Lab notebook.

Sentiment Analysis

I used the newsapi to pull the latest news articles for Bitcoin and Ethereum and created a DataFrame of sentiment scores for each coin.

Bitcoin Sentiment

I created the Bitcoin sentiment scores dataframe:

See Bitcoin sentiment below:

Ethereum Sentiment

I created the Ethereum sentiment scores dataframe:

See Ethereum sentiment as follows:

Some observations include that:

  • Ethereum had the highest mean positive score.
  • Ethereum had the highest mean compound score.
  • Bitcoin had the highest max compound score.

Natural Language Processing

In this section, I used NLTK and Python to tokenize text, find n-gram counts, and create word clouds for both coins.

Tokenize

I used NLTK and Python to tokenize the text for each coin. I completed the following:

  1. Changed each word to lowercase.
  2. Removed punctuation.
  3. Removed stop words.

See relevant code below:

I then added the "Tokens" column of the tokenized text to the dataframe:

N-grams

Then I looked at the ngrams and word frequency for each coin.

I completed as follows:

  1. Used NLTK to produce the ngrams for N = 2.
  2. Listed the top 10 words for each coin.

See below the count for ngrams for N = 2:

See below the code and results for the top 10 words for each coin:

Word Clouds

Finally, I generated word clouds for each coin to summarize the news for each coin.

See Bitcoin word cloud:

See Ethereum word cloud:

Named Entity Recognition

In this section, I built a named entity recognition (NER) model for both coins and visualized the tags using SpaCy.

See Bitcoin NER:

See Ethereum NER:

About

Analysis of the sentiment of the latest news articles on Bitcoin and Ethereum using sentiment analysis, natural language processing and named entity recognition.


Languages

Language:Jupyter Notebook 100.0%