nlp natural-language-processing matplotlib ner named-entity-recognition sentiment-analysis

According To Headlines

Decrypting Cryptocurrencies through Natural Language Processing (NLP):

In this project I will apply natural language processing to understand the sentiment in the latest article featuring Bitcoin & Ethereum. Also I will apply fundamental NLP techniques to better understand the other factors involved with the coin prices focusing on common words, phrases, organizations, and entities mentioned. Questions encompassing the "sentiment" is; what are articles saying about different cryptocurrencies? What is the current public sentiment surrounding these coins? NewsApi and NLTK (Natural Language Tool Kit) library utilized.

Tasks:

Sentiment Analysis
Natural Lanuguage Processing
Name Entity Recognition

Steps:

After setting my NewsAPI Key, the first step was to fetch Bitcoin & Ethereum news articles
Next was to create the Sentiment Scores DataFrame using a for-loop for Bitcoin
The same process was performed for Etherum. After converting to a DataFrame using pd.DataFrame
A ".describe()" function shows us important numbers related to the Sentiment Score

Tokenizing:

Steps:

Import NLTK (Natural Language ToolKit) nltk.tokenize, nltk.corpus, nltk.stem
Import Lemmatizer
Import word_tokenize, sent_tokenize
Import WordNetLemmatizer, PorterStemmer
Now we can take a look at unique word counts
We can then use the imported Counter function to count the frequency of words in the articles
The top 3 most frequently used words in the Bitcoin news articles were "Char" (95x), "Bitcoin" (49x) and "Reuters" (23x)

WordCloud

Word clouds are an intuitive way to visualize the frequency of different words in a news article to quickly see which words were most prominently used. Word clouds are also very easily generated with Python

Bitcoin WordCloud

Ethereum WordCloud

Name Entity Recognition (NER):

NER generates visually-appealing text that makes it clear what words are important within the article, and to what "category" that word belongs to: is it an organization, a currency, a name, etc.

Steps

Import Spacy
Concatenate all Bitcoin/Ethereum text together using the ".join()" function.
Then, run NER processor on text, and render visualization

Input:

bitcoin_doc = nlp(bitcoin_text) displacy.render(bitcoin_doc, style='ent')

Output:

Summary:

I found three questions I could ask while running this function while finding the sentiment scores from Bitcoin and Ethereum articles. My first question was Which coin had the highest mean positive score? Next, looking over the outputted data, I could ask, Which of the coins had the highest compound score? With that tidbit of information, I can go on to find Which of the coins had the highest positive score?

In my research, the data shown that Bitcoin had the highest mean score of 0.0519600. When looking at the compounded scores, I noticed that Bitcoin with a high score of 0.8834000. Also, in the same data, I was able to find that Bitcoin had the highest positive score as well. Leading the way with a score of 0.2740000.

About

In this project, I will apply natural language processing to understand the sentiment in the latest article featuring Bitcoin & Ethereum

nlp natural-language-processing matplotlib ner named-entity-recognition sentiment-analysis

Languages

Language:Jupyter Notebook 100.0%

jharvey09 / According_To_Headlines

According To Headlines

Decrypting Cryptocurrencies through Natural Language Processing (NLP):

Table Of Contents:

Tasks:

Steps:

Tokenizing:

Steps:

WordCloud

Bitcoin WordCloud

Ethereum WordCloud

Name Entity Recognition (NER):

Steps

Input:

Output:

Summary:

About

Languages