BouzidiImen / Social_media_Prediction_depression

This consists in using a variety of social networks data, including both images and texts, to detect early signs of depression.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Depression Detection

Logo-brain

This project aims to detect early indicators of depression by analyzing data from a range of social media platforms, including images and texts.


Table of Contents
  1. Data collection
    1. Visual Data
    2. Textual Data
  2. Models
    1. Models for images
    2. Models for texts
  3. Softwares and technologies
  4. Hardware

Data collection

Data were collected from Pexels, Unsplash and Twitter .
Pexels and Unsplash are two freely-usable images platforms.
Tweets used are publicly available.

Visual Data:

The overall process of scraping images from unsplash and pexels is presented as follows:

Image sample

Images were crawled from Pexels using Selenium and from Unsplash using UnsplashAPI.

  • 6250 images labeled as "Depressed"
  • 5234 images labeled as "Not Depressed"

This is a sample of the dataset:

Image sample

Images can be loaded as shown in Project Cheat Sheet and codes are available here .

Textual Data:

Hashtags that were used are trending hashtags using Keywords inspired from DSM-5(Diagnostic and Statistical Manual of Mental Disorders). Textual data were collected from Twitter users sharing their posts publicly using twint.
Overall, 5460 tweets were collected. The process was:

Image sample

You can check the result of texts loader in Project Cheat Sheet and codes are available here . This is a sample of the dataset:

Image sample

Models

Models for Images:

Trained five different types of models:

  • Deep CNN
  • ResNet50
  • BiT-L(ResNet50x1)
  • BiT-L(ResNet50x3)
  • BiT-L(ResNet101x1): This was the best model in term of Accuracy(0.82), Precisions, Recalls, and F1-scores with hyperparameters as follow SGD (Stochastic gradient descent) as optimizer, 50 epochs, leraning rate is variable and size of images is 128*128p

Models for Texts:

Trained two different types of models:

  • LSTM
  • GloVe+BiLSTM: This was the best model in term of Accuracy(0.7), Precisions, Recalls, and F1-scores.

For the best models I actually chose, you can find three notebooks:

  • For images: this notebook presents the test of BiT-L(ResNet101x1) model which is the best model for classifying images.
  • For Texts: this notebook presents the test of GloVe+BiLSTM which is the best model for classifying texts.
  • For integrating models: this notebook is to test the integration of BiT-L(ResNet101x1) and GloVe+BiLSTM to get a multimodal model.

You can find the saved weights for images best model and texts best model here.

Software and technologies:

  • Python (version 3.8.3)
  • Anaconda (Distribution 2020.02)
  • TensorFlow
  • Keras
  • Jupyter Notebook
  • Pycharm (Community Edition)

Hardware

In the process of the implementation of our solution we used two main machines, a local machine for refactoring codes, testing models and research, and a virtual machine (VM) on Google Cloud Platform (GCP) to run models and codes that are heavy in term of computation and time. Following are the specifications of these machines:

  • Local Machine: Lenovo E330:
    • Operating System : Kali Linux 2020.2
    • CPU: Intel Core i5-3230M 2,6GHz
    • RAM: 8 Go DDR3
    • Disk: 320 GB HDD
  • Virtual Machine on GCP: mastermind
    I used two configurations:
    • For tasks that are not heavy in both computation and time:
      • Operating System : Ubuntu 19.10
      • Machine Type: n1-highmem-8
      • CPU: 8 vCPUs
      • RAM: 16 Go
      • Disk: 100 GB SSD
    • For training models:
      • Operating System : Ubuntu 19.10
      • Machine Type: n1-highmem-8
      • CPU: 8 vCPUs
      • RAM: 52 Go
      • Disk: 100 GB SSD

    Finally, I hope you enjoyed my work and got inspired to help people get noticed 🧐. If you want more details you can find my report here .

    Please contact me for more details, I would be really happy to share more infos 😋.

About

This consists in using a variety of social networks data, including both images and texts, to detect early signs of depression.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 99.5%Language:Python 0.5%