soneeee22000 / DataAnalysis-and-NLP-Abhinav

All the work done, code written, files created and visualizations made by me while interning at 50Hands.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Analysis and NLP

GitHub last commit GitHub repo size

made-with-python Generic badge Generic badge

All the work done, code written, files created and visualizations made by me while interning at 50Hands.

For Emotion Recognition:

ML & NLP

A program was made, using Python, that detects emotion based on textual and facial features.

For textual emotion recognition, LSTM was used to train the model.

For facial emotion recognition, a CNN network was used to train the model.

The models were trained using Colab since it offers a GPU, hence faster computations.

For the 2020 US Elections:

Planning, Scraping & Visualization

Open In Collab

Using Tableau, visualizations were made to show important trends and statistics in relation to the 2020 US elections.

Some of the trends shown are:

  • Voting with respect to geography i.e., state and counties.
  • Voting with respect to populations in all the states and counties.
  • A state's Democratic or Republican preference with voter demographic population as a factor.
  • A region's Democratic or Republican preference with median income and poverty level as a factor, for both, states and counties.

For Canada, India and USA:

Visualization

Using Tableau, visualizations were made to show important Coronavirus parameters.

For Google Mobility:

Scraping & Visualization

Visualization made to show the mobility data for USA at state and county level.

A python script written to scrape the mobility data for USA and Canada in a orderly manner that helps in the creation of various visualizations.

The main script can be found in the Mobility Data folder named Index.py

For Global Data:

Scraping & Visualization

A script was written to extract global Coronavirus data from John Hopkin's University (from late January), create a data-frame and insert it into the database.

The data that is inserted into the database is then used to create a Global Covid-19 Tracker, which shows the progression of cases, change in various parameters, country-wise statistics and the relation between those parameters with constants like GDP, poverty rate et cetera.

The main script can be found in the 'World' folder named Index.py.

For Twitter Sentiment Analysis:

Visualization

For Canada, India and USA, a visualization was made for Twitter Sentiment Analysis.

For Reddit Sentiment Analysis:

NLP & Visualization

For various countries, posts were extracted from their designated Covid-19 subreddits and sentiment analysis was performed.

A visualization was made showing these sentiment statistics.

For Canada, India and USA, wordclouds and bigram-clouds were created using the same post extraction process as before.

For this too visualizations were created.

For Quotes Generation:

NLP

Open In Collab

GPT-2 was trained using a csv of quotes, in the hope that the trained model would generate similar quotes, that made sense.

Most of the code was already present and can be found in GPT-2's documentation.

A few modifications were made to put the generated quotes into a CSV file, so that a quote could be extracted when the API for that is called in negligible time.

For reference: https://github.com/openai/gpt-2


ForTheBadge built-by-developers ForTheBadge built-with-love ForTheBadge built-with-science

ForTheBadge winter-is-coming

About

All the work done, code written, files created and visualizations made by me while interning at 50Hands.


Languages

Language:Python 58.5%Language:Jupyter Notebook 41.4%Language:PHP 0.0%