HKUDS / PromptMM

[WWW'2024] "PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning"

Home Page:https://arxiv.org/abs/2402.17188

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning

PyTorch implementation for WWW 2023 paper PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning.

Wei Wei, Jiabin Tang, Yangqin Jiang, Lianghao Xia and Chao Huang*. (*Correspondence)

Dependencies

Usage

Start training and inference as:

python ./main.py --dataset {DATASET}

Supported datasets: Amazon-Electronics, Netflix, Tiktok

Datasets

├─ MMSSL/ 
    ├── data/
      ├── tiktok/
      ...
Dataset Netflix Tiktok Electronics
Modality V T V A T V T
Feat. Dim. 512 768 128 128 768 4096 1024
User 43,739 14,343 41,691
Item 17,239 8,690 21,479
Interaction 609,341 276,637 359,165
Sparsity 99.919% 99.778% 99.960%
  • 2024.2.27 new multi-modal datastes uploaded: 📢📢 🌹🌹 We provide new multi-modal datasets Netflix and MovieLens (i.e., CF training data, multi-modal data including item text and posters) of new multi-modal work LLMRec on Google Drive. 🌹We hope to contribute to our community and facilitate your research~

  • 2023.2.27 update(all datasets uploaded): We provide the processed data at Google Drive.

🚀🚀 The provided dataset is compatible with multi-modal recommender models such as MMSSL, LATTICE, and MICRO and requires no additional data preprocessing, including (1) basic user-item interactions and (2) multi-modal features.

# part of data preprocessing
# #----json2mat--------------------------------------------------------------------------------------------------
import json
from scipy.sparse import csr_matrix
import pickle
import numpy as np
n_user, n_item = 39387, 23033
f = open('/home/weiw/Code/MM/MMSSL/data/clothing/train.json', 'r')  
train = json.load(f)
row, col = [], []
for index, value in enumerate(train.keys()):
    for i in range(len(train[value])):
        row.append(int(value))
        col.append(train[value][i])
data = np.ones(len(row))
train_mat = csr_matrix((data, (row, col)), shape=(n_user, n_item))
pickle.dump(train_mat, open('./train_mat', 'wb'))  
# # ----json2mat--------------------------------------------------------------------------------------------------


# ----mat2json--------------------------------------------------------------------------------------------------
# train_mat = pickle.load(open('./train_mat', 'rb'))
test_mat = pickle.load(open('./test_mat', 'rb'))
# val_mat = pickle.load(open('./val_mat', 'rb'))

# total_mat = train_mat + test_mat + val_mat
total_mat =test_mat

# total_mat = pickle.load(open('./new_mat','rb'))
# total_mat = pickle.load(open('./new_mat','rb'))
total_array = total_mat.toarray()
total_dict = {}

for i in range(total_array.shape[0]):
    total_dict[str(i)] = [index for index, value in enumerate(total_array[i]) if value!=0]

new_total_dict = {}

for i in range(len(total_dict)):
    # if len(total_dict[str(i)])>1:
    new_total_dict[str(i)]=total_dict[str(i)]

# train_dict, test_dict = {}, {}

# for i in range(len(new_total_dict)):
#     train_dict[str(i)] = total_dict[str(i)][:-1]
#     test_dict[str(i)] = [total_dict[str(i)][-1]]

# train_json_str = json.dumps(train_dict)
test_json_str = json.dumps(new_total_dict)

# with open('./new_train.json', 'w') as json_file:
# # with open('./new_train_json', 'w') as json_file:
#     json_file.write(train_json_str)
with open('./test.json', 'w') as test_file:
# with open('./new_test_json', 'w') as test_file:
    test_file.write(test_json_str)
# ----mat2json--------------------------------------------------------------------------------------------------

Acknowledgement

Acknowledgement

The structure of this code is largely based on LATTICE, MICRO. Thank them for their work.

About

[WWW'2024] "PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning"

https://arxiv.org/abs/2402.17188


Languages

Language:Python 100.0%