Arsh2k01 / UTrack

UTrack analyses the user's tweets and finds the level of Loneliness, Stress, and Anxiety, and their trends over time

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Project under Consulting and Analytics Club, IITG

1. Technologies Used

  1. Tweepy API
  2. NLTK
  3. BERT Model
  4. Tensorflow
  5. Seaborn
  6. Streamlit

2. Project Description

2.1 Data Extraction and Preprocessing

We scraped data for each illness using the Tweepy API, based on keywords and phrases for each category. Additionally, we scraped tweets that didn't contain these keywords. This data acted as the ‘neutral’ data. The data was cleaned using libraries like Regex and NLTK. Links, emojis, emoticons, and symbols were removed.

2.2 DL Model

We explored Transformer models and found that BERT(Bidirectional Encoder Representations from Transformers) was better-suited for sentiment analysis. We used a pretrained BERT model and fine-tuned it on our training data. We trained a model for each class.
The output given by the final layer was not fed to any activation function; it was instead given as input to a custom function to normalize and standardize the data. The function is given below:


2.3 Visualisation and Deployment

We used Seaborn to display the calculated level of Loneliness, Stress, and Anxiety for each user across time, thus enabling us to see how the user's mental state varied over time. Moreover, we estimate the weighted average for each category, over previous tweets [0:LOW,1:HIGH]. Additionally, you can also view each specific tweet and its scores. Deployment was done using Streamlit.

3. Files

  • Cleaning Tweets.py - Script to clean scraped tweets
  • Extracting Targeted Tweets.py - Script to scrape a user's Twitter information
  • Streamlit Deployment.py - Script to deploy the project
  • Streamlit Deployment.ipynb - Jupyter Notebook to deploy the project
  • Extracted Tweets - Training Data
  • Training Models:
    • Anxiety Model.py
    • Lonely Model.py
    • Stress Model.py

4. Usage

To use UTrack, first add this folder to your Google Drive.
Then run Streamlit Deployment.ipynb on Google Colab. Click on the ngrok link produced by the .ipynb file.

Once you go to the localhost, use the following video as a reference:

demo video

5. Team

6. References

7. License

MIT

About

UTrack analyses the user's tweets and finds the level of Loneliness, Stress, and Anxiety, and their trends over time

License:MIT License


Languages

Language:Python 64.0%Language:Jupyter Notebook 36.0%