MMSA

Pytorch implementation for codes in multimodal sentiment analysis.

Note: We strongly recommend that you browse the overall structure of our code at first. If you have any question, feel free to contact us.

Support Models

In this framework, we support the following methods:

Type	Model Name	From
Single-Task	EF_LSTM	MultimodalDNN
Single-Task	LF_DNN	-
Single-Task	TFN	Tensor-Fusion-Network
Single-Task	LMF	Low-rank-Multimodal-Fusion
Single-Task	MFN	Memory-Fusion-Network
Single-Task	Graph-MFN	Graph-Memory-Fusion-Network
Single-Task	MulT(without CTC)	Multimodal-Transformer
Single-Task	MISA	MISA
Multi-Task	MLF_DNN	MMSA
Multi-Task	MTFN	MMSA
Multi-Task	MLMF	MMSA
Multi-Task	SELF_MM	Self-MM

Results

Detailed results are shown in results/result-stat.md

Usage

Clone codes

Clone this repo and install requirements.

git clone https://github.com/thuiar/MMSA
cd MMSA
pip install -r requirements.txt

Datasets and pre-trained berts

Download dataset features and pre-trained berts from the following links.

Baidu Cloud Drive with code: ctgs
Google Cloud Drive

For all features, you can use SHA-1 Hash Value to check the consistency.

MOSI/unaligned_50.pkl: 5da0b8440fc5a7c3a457859af27458beb993e088
MOSI/aligned_50.pkl: 5c62b896619a334a7104c8bef05d82b05272c71c
MOSEI/unaligned_50.pkl: db3e2cff4d706a88ee156981c2100975513d4610
MOSEI/aligned_50.pkl: ef49589349bc1c2bc252ccc0d4657a755c92a056
SIMS/unaligned_39.pkl: a00c73e92f66896403c09dbad63e242d5af756f8

Due to the size limitations, the MOSEI features and SIMS raw videos are available in Baidu Cloud Drive only. All dataset features are organized as:

{
    "train": {
        "raw_text": [],
        "audio": [],
        "vision": [],
        "id": [], # [video_id$_$clip_id, ..., ...]
        "text": [],
        "text_bert": [],
        "audio_lengths": [],
        "vision_lengths": [],
        "annotations": [],
        "classification_labels": [], # Negative(< 0), Neutral(0), Positive(> 0)
        "regression_labels": []
    },
    "valid": {***}, # same as the "train" 
    "test": {***}, # same as the "train"
}

For MOSI and MOSEI, the pre-extracted text features are from BERT, different from the original glove features in the CMU-Multimodal-SDK.

For SIMS, if you want to extract features from raw videos, you need to install Openface Toolkits first, and then refer our codes in the data/DataPre.py.

python data/DataPre.py --data_dir [path_to_Dataset] --language ** --openface2Path  [path_to_FeatureExtraction]

For bert models, you also can download Bert-Base, Chinese from Google-Bert. And then, convert tensorflow into pytorch using transformers-cli

Then, modify config/config_*.py to update dataset pathes.

Run

python run.py --modelName *** --datasetName ***

Paper

Please cite our paper if you find our work useful for your research:

@inproceedings{yu2020ch,
  title={CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality},
  author={Yu, Wenmeng and Xu, Hua and Meng, Fanyang and Zhu, Yilin and Ma, Yixiao and Wu, Jiele and Zou, Jiyun and Yang, Kaicheng},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  pages={3718--3727},
  year={2020}
}

@article{yu2021learning,
  title={Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis},
  author={Yu, Wenmeng and Xu, Hua and Yuan, Ziqi and Wu, Jiele},
  journal={arXiv preprint arXiv:2102.04830},
  year={2021}
}

About

CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotations of Modality (ACL2020)

MIT License

Languages

Language:Python 100.0%