What makes the difference? An Empirical Comparison of Fusion Strategies for Multimodal Language Analysis
This repository includes SOTA modality fusion approaches for sentiment analysis and emotion recognition tasks. All models have implemented in a unified PyTorch framework for conducting an empirical comparison across different fusion approaches.
- Monologue Datasets: https://www.dropbox.com/s/7z56hf9szw4f8m8/cmumosi_cmumosei_iemocap.zip?dl=0
- Containing CMUMOSI, CMUMOSEI and IEMOCAP datasets
- Each dataset has the CMU-SDK version and Multimodal-Transformer version (with different input dimensionalities)
- Set up the configurations in config/run.ini
- python run.py -config config/run.ini
- Monologue
- mode = run
- dataset_type = multimodal
- pickle_dir_path = /path/to/datasets/. The absolute path of the folder storing the datasets.
- dataset_name in
{'cmumosei','cmumosi','iemocap'}
. Name of the dataset. - features in
{'acoustic', 'visual', 'textual'}
. Multiple modality names should be joined by ','. - label in
{'sentiment','emotion'}
. Multiple labels should be joined by ','. - wordvec_path. The relative path of the pre-trained word embedding file.
- dialogue_format = False. Disable the dialogue format.
- dialogue_context = False. Disable the use of dialogue context.
- embedding_trainable in
{'True','False'}
. Whether you want to train the word embedding for textual modality. Usually set to be True. - case_study in
{'True','False'}
. Whether you want to generate per-sample model predictions to files.- model_prediction in
{'True','False'}
. Whether model prediction for each sample will be exported to a file. Requires case_study = True. - true_labels in
{'True','False'}
. Whether true label for each sample will be exported to a file. Requires case_study = True. - per_sample_analysis in
{'True','False'}
. Whether true label + model prediction for each sample will be exported to a file. Requires case_study = True.
- model_prediction in
- seed. The random seed for the experiment.
- load_model_from_dir in
{'True','False'}
. Whether the model is loaded from a saved file.- dir_name. The directory storing the model configurations and model parameters. Requires load_model_from_dir = True.
- fine_tune in `{'True','False'}. Whether you want to train the model with the data.
- model specific parameters. For running a model on the dataset, uncomment the respective area of the model and comment the areas for the other models. Please refer to the model implementations in /models/monologue/ for the meaning of each model specific parameter.
- supported models include but are not limited to:
- EF-LSTM
- LF-LSTM
- RMFN
- TFN
- LMF
- MARN
- Multimodal-Transformer (only for word-aligned data)
- MMUU-BA
- RAVEN (only for word-aligned data)
- MFN
- supported models include but are not limited to:
- Set up the configurations in config/grid_search.ini. Tweak a couple of fields in the single run configurations, as instructed below.
- Write up the hyperparameter pool in config/grid_parameters/.
- python run.py -config config/grid_search.ini
- mode = run_grid_search
- grid_parameters_file. The name of file storing the parameters to be searched, under the folder /config/grid_parameters.
- the format of a file is:
- [COMMON]
- var_1 = val_1;val_2;val_3
- var_2 = val_1;val_2;val_3
- the format of a file is:
- search_times. The number of times the program searches in the pool of parameters.
- output_file. The file storing the performances for each search in the pool of parameters. By default, it is eval/grid_search_
{dataset_name}
_{network_type}
.csv
Gkoumas, D., Li, Q., Lioma, C., Yu, Y., & Song, D. (2021). What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis. Information Fusion, 66, 184-197.