graph-neural-networks multimodal-deep-learning pytorch recommender-system reproducibility multimedia-recommendation multimedia-systems multimodal-retrieval

Formalizing Multimedia Recommendation through Multimodal Deep Learning

Official repository for the paper Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.

Authors

Daniele Malitesta* (daniele.malitesta@centralesupelec.fr)
Giandomenico Cornacchia (giandomenico.cornacchia@poliba.it)
Claudio Pomo (claudio.pomo@poliba.it)
Felice Antonio Merra** (felmerra@amazon.de)
Tommaso Di Noia (tommaso.dinoia@poliba.it)
Eugenio Di Sciascio (eugenio.disciascio@poliba.it)

* Work done while at Politecnico di Bari as a PhD student.

** Work done while at Politecnico di Bari before joining Amazon.

If you wish to cite our paper, here is a reference:

@article{DBLP:journals/corr/abs-2309-05273,
  author       = {Daniele Malitesta and
                  Giandomenico Cornacchia and
                  Claudio Pomo and
                  Felice Antonio Merra and
                  Tommaso {Di Noia} and
                  Eugenio {Di Sciascio}},
  title        = {Formalizing Multimedia Recommendation through Multimodal Deep Learning},
  journal      = {CoRR},
  volume       = {abs/2309.05273},
  year         = {2023}
}

Review

Paper	Year	Title
Ferracani et al.	2015	A System for Video Recommendation using Visual Saliency, Crowdsourced and Automatic Annotations
Jia et al.		Multi-modal learning for video recommendation based on mobile application usage
Li et al.		Video recommendation based on multi-modal information and multiple kernel
Nie et al.	2016	Quality models for venue recommendation in location-based social network
Chen et al.		Context-aware Image Tweet Modelling and Recommendation
Han et al.	2017	Learning Fashion Compatibility with Bidirectional LSTMs
Oramas et al.		A Deep Multimodal Approach for Cold-start Music Recommendation
Zhang et al.		Hashtag Recommendation for Multimodal Microblog Using Co-Attention Network
Ying et al.	2018	Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Wang et al.		LRMM: Learning to Recommend with Missing Modalities
Liu et al.	2019	User Diverse Preference Modeling by Multimodal Attentive Metric Learning
Chen et al.		Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network: Towards Visually Explainable Recommendation
Wei et al.		MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video
Cheng et al.		MMALFM: Explainable Recommendation by Leveraging Reviews and Images
Dong et al.		Personalized Capsule Wardrobe Creation with Garment and User Modeling
Chen et al.		POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion
Yu et al.	2020	Vision-Language Recommendation via Attribute Augmented Multimodal Reinforcement Learning
Cui et al.		MV-RNN: A Multi-View Recurrent Neural Network for Sequential Recommendation
Wei et al.		Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback
Sun et al.		Multi-modal Knowledge Graphs for Recommender Systems
Chen et al.		Neural Tensor Model for Learning Multi-Aspect Factors in Recommender Systems
Min et al.		Food Recommendation: Framework, Existing Solutions, and Challenges
Shen et al.		Enhancing Music Recommendation with Social Media Content: an Attentive Multimodal Autoencoder Approach
Yang et al.		Learning to Match on Graph for Fashion Compatibility Modeling
Tao et al.		MGAT: Multimodal Graph Attention Network for Recommendation
Yang et al.		AMNN: Attention-Based Multimodal Neural Network Model for Hashtag Recommendation
Sang et al.	2021	Context-Dependent Propagating-Based Video Recommendation in Multimodal Heterogeneous Information Networks
Liu et al.		Pre-training Graph Transformer with Multimodal Side Information for Recommendation
Zhang et al.		Mining Latent Structures for Multimedia Recommendation
Vaswani et al.		Multimodal Fusion Based Attentive Networks for Sequential Music Recommendation
Lei et al.		Is the suggested food your desired?: Multi-modal recipe recommendation with demand-based knowledge graph
Wang et al.		Market2Dish: Health-aware Food Recommendation
Zhan et al.	2022	A3-FKG: Attentive Attribute-Aware Fashion Knowledge Graph for Outfit Preference Prediction
Wu et al.		MM-Rec: Visiolinguistic Model Empowered Multimodal News Recommendation
Yi et al.		Multi-Modal Variational Graph Auto-Encoder for Recommendation Systems
Yi et al.		Multi-modal Graph Contrastive Learning for Micro-video Recommendation
Liu et al.		Multi-Modal Contrastive Pre-training for Recommendation
Mu et al.		Learning Hybrid Behavior Patterns for Multimedia Recommendation
Chen et al.		Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation
Yi et al.		A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation
Wang et al.	2023	DualGNN: Dual Graph Neural Network for Multimedia Recommendation
Wei et al.		Multi-Modal Self-Supervised Learning for Recommendation
Zhou et al.		Bootstrap Latent Representations for Multi-modal Recommendation

Benchmarking

First, install all useful dependencies through:

pip install -r requirements.txt
pip install -r requirements_torch_geometric.txt

If you want to train again all models, run the following:

python -u start_experiments.py --config <dataset_name>

where dataset_name is one of the datasets in our benchmarks.

If you just want to run the generations of the results, run the following:

python -u start_experiments.py --config <dataset_name>_results

where dataset_name is one of the datasets in our benchmarks.

Note that the results may slightly differ from the ones provided here and in the paper, depending on the machine you are running the experiments on.

Office (best results)

Models	Recall@10	nDCG@10	EFD@10	Gini@10	APLT@10	iCov@10	Recall@20	nDCG@20	EFD@20	Gini@20	APLT@20	iCov@20
VBPR	0.0652	0.0419	0.1753	0.3634	0.2321	93.83%	0.1025	0.0533	0.1479	0.3960	0.2375	97.51%
MMGCN	0.0455	0.0300	0.1140	0.0128	0.0016	3.07%	0.0798	0.0405	0.1027	0.0231	0.0078	4.64%
GRCN	0.0393	0.0253	0.1215	0.4587	0.3438	99.01%	0.0667	0.0339	0.1051	0.4892	0.3469	99.79%
LATTICE	0.0664	0.0449	0.1827	0.2128	0.1752	87.86%	0.1029	0.0566	0.1513	0.2652	0.2039	95.90%
BM3	0.0701	0.0460	0.1837	0.1407	0.1427	77.13%	0.1081	0.0583	0.1550	0.1900	0.1715	91.55%
FREEDOM	0.0560	0.0365	0.1493	0.1922	0.1875	79.12%	0.0884	0.0469	0.1282	0.2439	0.2080	90.64%

Toys (best results)

Models	Recall@10	nDCG@10	EFD@10	Gini@10	APLT@10	iCov@10	Recall@20	nDCG@20	EFD@20	Gini@20	APLT@20	iCov@20
VBPR	0.0710	0.0458	0.1948	0.2645	0.1064	84.90%	0.1006	0.0545	0.1527	0.3011	0.1180	92.82%
MMGCN	0.0256	0.0150	0.0648	0.0989	0.0961	37.87%	0.0426	0.0200	0.0570	0.1450	0.1058	52.51%
GRCN	0.0554	0.0354	0.1604	0.3954	0.2368	92.66%	0.0831	0.0436	0.1298	0.4329	0.2482	97.73%
LATTICE	0.0805	0.0512	0.2090	0.1656	0.0546	73.80%	0.1165	0.0617	0.1665	0.2026	0.0684	86.58%
BM3	0.0613	0.0393	0.1582	0.0776	0.0486	56.23%	0.0901	0.0478	0.1270	0.1154	0.0658	73.50%
FREEDOM	0.0870	0.0548	0.2284	0.1474	0.0756	62.09%	0.1249	0.0660	0.1820	0.2007	0.0951	78.42%

Beauty (best results)

Models	Recall@10	nDCG@10	EFD@10	Gini@10	APLT@10	iCov@10	Recall@20	nDCG@20	EFD@20	Gini@20	APLT@20	iCov@20
VBPR	0.0760	0.0483	0.2119	0.2076	0.0833	83.06%	0.1102	0.0586	0.1700	0.2376	0.0915	91.41%
MMGCN	0.0496	0.0294	0.1300	0.0252	0.0282	13.75%	0.0772	0.0379	0.1105	0.0423	0.0345	21.37%
GRCN	0.0575	0.0370	0.1817	0.3823	0.2497	94.59%	0.0892	0.0466	0.1498	0.4178	0.2608	98.56%
LATTICE	0.0867	0.0544	0.2272	0.1153	0.0386	65.82%	0.1259	0.0661	0.1830	0.1558	0.0511	81.60%
BM3	0.0713	0.0443	0.1831	0.0245	0.0179	32.31%	0.1051	0.0545	0.1490	0.0414	0.0228	48.75%
FREEDOM	0.0864	0.0539	0.2279	0.0921	0.0486	55.89%	0.1286	0.0666	0.1868	0.1359	0.0653	72.96%

Sports (best results)

Models	Recall@10	nDCG@10	EFD@10	Gini@10	APLT@10	iCov@10	Recall@20	nDCG@20	EFD@20	Gini@20	APLT@20	iCov@20
VBPR	0.0450	0.0281	0.1167	0.1501	0.0497	75.77%	0.0677	0.0349	0.0949	0.1722	0.0552	86.54%
MMGCN	0.0342	0.0207	0.0791	0.0095	0.0046	5.10%	0.0551	0.0269	0.0678	0.0168	0.0065	8.39%
GRCN	0.0330	0.0202	0.0885	0.3087	0.2190	91.28%	0.0523	0.0259	0.0746	0.3386	0.2273	97.09%
LATTICE	0.0610	0.0372	0.1465	0.0573	0.0129	48.44%	0.0898	0.0456	0.1185	0.0802	0.0185	64.90%
BM3	0.0548	0.0349	0.1372	0.0776	0.0283	59.13%	0.0825	0.0430	0.1118	0.1120	0.0385	76.75%
FREEDOM	0.0603	0.0375	0.1494	0.0621	0.0319	48.37%	0.0911	0.0465	0.1219	0.0926	0.0442	65.81%

Clothing (best results)

Models	Recall@10	nDCG@10	EFD@10	Gini@10	APLT@10	iCov@10	Recall@20	nDCG@20	EFD@20	Gini@20	APLT@20	iCov@20
VBPR	0.0339	0.0181	0.0502	0.2437	0.0809	83.40%	0.0529	0.0229	0.0413	0.2791	0.0915	92.33%
MMGCN	0.0227	0.0119	0.0292	0.0136	0.0044	7.58%	0.0348	0.0150	0.0240	0.0236	0.0066	12.44%
GRCN	0.0319	0.0164	0.0481	0.3990	0.2358	93.37%	0.0496	0.0209	0.0397	0.4368	0.2459	97.77%
LATTICE	0.0502	0.0275	0.0738	0.1022	0.0134	58.49%	0.0744	0.0336	0.0589	0.1384	0.0207	76.20%
BM3	0.0418	0.0226	0.0596	0.1348	0.0319	72.88%	0.0633	0.0281	0.0486	0.1825	0.0449	88.65%
FREEDOM	0.0547	0.0294	0.0805	0.1509	0.0600	65.54%	0.0822	0.0363	0.0652	0.2078	0.0843	81.91%

About

Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.

graph-neural-networks multimodal-deep-learning pytorch recommender-system reproducibility multimedia-recommendation multimedia-systems multimodal-retrieval

Languages

Language:Python 99.9%Language:Dockerfile 0.1%