Project under Consulting and Analytics Club, IITG
- Tweepy API
- NLTK
- BERT Model
- Tensorflow
- Seaborn
- Streamlit
We scraped data for each illness using the Tweepy API, based on keywords and phrases for each category. Additionally, we scraped tweets that didn't contain these keywords. This data acted as the ‘neutral’ data. The data was cleaned using libraries like Regex and NLTK. Links, emojis, emoticons, and symbols were removed.
We explored Transformer models and found that BERT(Bidirectional Encoder Representations from Transformers) was better-suited for sentiment analysis. We used a pretrained BERT model and fine-tuned it on our training data. We trained a model for each class.
The output given by the final layer was not fed to any activation function; it was instead given as input to a custom function to normalize and standardize the data. The function is given below:
We used Seaborn to display the calculated level of Loneliness, Stress, and Anxiety for each user across time, thus enabling us to see how the user's mental state varied over time. Moreover, we estimate the weighted average for each category, over previous tweets [0:LOW,1:HIGH]
.
Additionally, you can also view each specific tweet and its scores.
Deployment was done using Streamlit.
Cleaning Tweets.py
- Script to clean scraped tweetsExtracting Targeted Tweets.py
- Script to scrape a user's Twitter informationStreamlit Deployment.py
- Script to deploy the projectStreamlit Deployment.ipynb
- Jupyter Notebook to deploy the project- Extracted Tweets - Training Data
- Training Models:
Anxiety Model.py
Lonely Model.py
Stress Model.py
To use UTrack, first add this folder to your Google Drive.
Then run Streamlit Deployment.ipynb
on Google Colab. Click on the ngrok link produced by the .ipynb file.
Once you go to the localhost, use the following video as a reference: