Probing the Performance Limits of Text Recommender Models Using LLMs: Discoveries and Perspectives

This repository is intended for anonymous review purposes. We have only provided the extra results for review. Upon acceptance, we will make all the data available for download. Additionally, we will provide the full code of the paper strictly for research purposes.

More results on NDCG10

Table: Accuracy (NDCG@10) comparison of IDCF and TCF using DSSM and SASRec. FR represents using frozen LM, while FT represents using fine-tuned LM.

Table: Warm item recommendation (NDCG@10). 20 means items < 20 interactions are removed. TCF\textsubscript{175B} uses the pre-extracted features from the 175B LM. Only SASRec backbone is reported.

Table: TCF's results (NDCG@10) with renowned text encoders in the last 10 years. Text encoders are frozen and the SASRec backbone is used. Notable advances in NLP benefit RS.

TCF’s performance (y-axis: NDCG@10(%)) with 9 text encoders of increasing size (x-axis). SASRec (upper three subfigures) and DSSM (bottom three subfigures

TCF with retrained LM vs frozen LM (y-axis: NDCG@10(%)), where only the top two layers are retrained. The 175B LM is not retrained due to its ultra-high computational cost.