iyuge2 / M-SENA-Backend

Multimodal Sentiment Analysis System

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python 3.6 Torch 1.2 Flask 1.1.2 License

This project is the backend of the M-SENA Platform.

Installation

Docker

We provide a docker image of our platform. See the main repo for instructions.

From Source

1. Clone this Repository

$ git clone https://github.com/iyuge2/M-SENA-Backend.git
$ cd M-SENA-Backend

2. Install Requirements

  • Install system requirements
$ apt install mysql-server default-libmysqlclient-dev libsndfile1 ffmpeg
  • Install python requirements
$ conda create --name sena python=3.8
$ source active sena
$ pip install -r requirements.txt

3. Configure MySQL

  • Login MySQL with root
$ mysql -u root -p
  • Create a database for M-SENA
mysql> CREATE DATABASE sena;
  • Create a user for M-SENA and grant privileges
mysql> CREATE USER sena IDENTIFIED BY 'MyPassword';
mysql> GRANT ALL PRIVILEGES ON sena.* TO sena@`%`;
mysql> FLUSH PRIVILEGES;

4. Configs

  • Edit Constants.py. Alter DATASET_ROOT_DIR, DATASET_SERVER_IP, OPENFACE_FEATURE_PATH, MM_CODES_PATH, MODEL_TMP_SAVE, AL_CODES_PATH and LIVE_TMP_PATH to fit your settings.
  • Edit config.sh. Look for DATABASE_URL and change it to fit your database settings.

5. Datasets

  • Download datasets and locate them under DATASET_ROOT_DIR specified in constants.py
  • Add information in DATASET_ROOT_DIR/config.json file to register the new dataset.
  • Format datasets with MM-Codes/data/DataPre.py
  • For datasets that needs labeling, the config file locates in AL-Codes directory.
$ python MM-Codes/data/DataPre.py --working_dir $PATH_TO_DATASET --openface2Path $PATH_TO_OPENFACE2_FeatureExtraction_TOOL --language cn/en
  • The structure of the DATASET_ROOT_DIR directory is introduced in the next section.

6. Run

$ source config.sh
$ flask run --host=0.0.0.0

Reference

Dataset Structure

The structure of the root dataset directory should look like this:

.
├── config.json
├── MOSEI
│   ├── label.csv
│   ├── Processed
│   └── Raw
├── MOSI
│   ├── label.csv
│   ├── Processed
│   └── Raw
└── SIMS
    ├── label.csv
    ├── Processed
    └── Raw
  • config.json: stating necessary information for all datasets. For example, language, label_path, features, etc. It only works when scanning and updating datasets.
  • **/label.csv: storing detailed information for each video clip in ** dataset, including video_id, clip_id, normal text, label value (Float), annotation (String), mode (training attributes). Besides, we define a field label_by to indicate the label type, which is necessary for labeling based on active learning.

dataset-Label

  • **/Processed: placing feature files. We use pickle to store processed features, which are organized as the following structure. These files are used in MM-Codes.
{
    "train": {
        "raw_text": [],
        "audio": [],
        "vision": [],
        "id": [], # [video_id$_$clip_id, ..., ...]
        "text": [],
        "text_bert": [],
        "audio_lengths": [],
        "vision_lengths": [],
        "annotations": [],
        "classification_labels": [], # Negative(< 0), Neutral(0), Positive(> 0)
        "regression_labels": []
    },
    "valid": {***}, # same as the "train"
    "test": {***}, # same as the "train"
}
  • **/Raw: placing raw videos. The path of each clip should be consistent with label.csv.

We provide the download link for preprocessed SIMS, code: 4aa6, md5: 3befed5d2f6ea63a8402f5875ecb220d, which follows the above requirements. You can get more datasets from CMU-MultimodalSDK.

Code Structure

The source code is organized as follows:

.
├── AL-Codes                # Active learning codes
├── MM-Codes                # MSA algorithm codes
├── app.py                  # Flask main codes
├── config.py               # Basic config
├── config.sh               # Basic config
├── constants.py            # Global variable definition
├── database.py             # Database definition & initialization
├── httpServer.py           # Dataset server (for video previews)
└── requirements.txt        # Python requirements
  • MM-Codes

MSA Code Framework

Based on MMSA, all model and dataset parameters are saved in MM-Codes/config.json.

  • AL-Codes

Labeling based on Active Learning Code Framework

Based on MMSA, all model and dataset parameters are saved in AL-Codes/config.json.

About

Multimodal Sentiment Analysis System

License:GNU General Public License v3.0


Languages

Language:Python 100.0%