RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library

Home Page:https://recbole.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[🐛BUG] Validation with mode "unixxx" is extremely slow compared to "full".

lukas-wegmeth opened this issue · comments

Describe the bug
I was benchmarking many RecBole algorithms on a few data sets with "uni100" and "uni50" validation modes and noticed that validation took an unexpectedly long time.
Therefore, I have tested multiple combinations of settings to figure out if this only happens rarely, but I can reproduce it consistently.
I provide the results of my tests below. I wonder if this is a bug because, in my understanding, validation mode "unixxx" should be faster than "full". And I would expect "full" to take much longer than it does.
Please check if the data provided in the tables looks like you would expect, and if so, help me understand why "unixxx" takes so long compared to "full".

To Reproduce
Steps to reproduce the behavior:

import argparse
import json
import time
from logging import getLogger

from recbole.config import Config
from recbole.data import create_dataset, data_preparation
from recbole.utils import ModelType, get_model, get_trainer, init_seed, init_logger

import torch

if __name__ == "__main__":
    parser = argparse.ArgumentParser("Fit RecBole")
    parser.add_argument('--data_set_name', dest='data_set_name', type=str, required=True)
    parser.add_argument('--algorithm_name', dest='algorithm_name', type=str, required=True)
    parser.add_argument('--algorithm_config', dest='algorithm_config', type=int, required=True)
    parser.add_argument('--fold', dest='fold', type=int, required=True)

    args = parser.parse_args()

    print(f"CUDA available: {torch.cuda.is_available()}")
    print(f"CUDA version: {torch.version.cuda}")
    print(f"CUDNN version: {torch.backends.cudnn.version()}")
    print(f"PyTorch version: {torch.__version__}")

    config_dict = {
        "seed": 42,  # default: "2020"
        "data_path": "./data_sets/",  # default: "dataset/"
        "checkpoint_dir": f"./data_sets/{args.data_set_name}/checkpoint_{args.algorithm_name}/"
                          f"config_{args.algorithm_config}/fold_{args.fold}/",  # default: "saved/"
        "benchmark_filename": [f"train_split_fold_{args.fold}", f"valid_split_fold_{args.fold}",
                               f"test_split_fold_{args.fold}"],
        # default: None
        "field_separator": ",",  # default: "\t"
        "epochs": 50,  # default: 300
        "eval_step": 3,  # default: 1
        "stopping_step": 3,  # default: 10
        "eval_args":
            {
                "group_by": "user",  # default: "user"
                "order": "RO",  # default: "RO"
                "split":
                    {
                        # "RS": [8, 1, 1] # default: {"RS": [8, 1, 1]}
                        "LS": "valid_and_test"
                    },
                "mode":
                    {
                        "valid": "uni50",  # default: "full"
                        "test": "full",  # default: "full"
                    },
            },
        "metrics": ["NDCG"],
        # default: ["Recall", "MRR", "NDCG", "Hit", "Precision"]
        "topk": [10],  # default: 10
        "valid_metric": "NDCG@10",  # default: "MRR@10"
        "eval_batch_size": 32768,  # default: 4096
        # misc settings
        "model": args.algorithm_name,
        "MODEL_TYPE": ModelType.GENERAL,  # default: ModelType.GENERAL
        "dataset": args.data_set_name,  # default: None
    }
    print(f"Running algorithm {args.algorithm_name} configuration: {configurations[args.algorithm_config]}")

    config = Config(config_dict=config_dict)
    init_seed(config['seed'], config['reproducibility'])
    init_logger(config)
    logger = getLogger()
    logger.info(config)

    config["data_path"] = f"./data_sets/{args.data_set_name}/atomic/"
    dataset = create_dataset(config)
    logger.info(dataset)
    train_data, valid_data, test_data = data_preparation(config, dataset)

    model = get_model(config["model"])(config, train_data.dataset).to(config['device'])
    logger.info(model)
    trainer = get_trainer(config["MODEL_TYPE"], config["model"])(config, model)
    start_fit = time.time()
    best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
    end_fit = time.time()
    model_file = trainer.saved_model_file

Expected behavior
Validation modes "unixxx" should be faster than "full".

Screenshots

Data Set Model Eval Batch Size Validation Mode Epoch Training Time (seconds) Validation Time (seconds) Validation Score (nDCG@10)
MovieLens-100K DGCF 32768 uni100 1.7 10.37 0.3308
MovieLens-100K DGCF 4096 uni100 1.64 22.29 0.3308
MovieLens-100K DGCF 32768 uni50 1.68 4.4 0.398
MovieLens-100K DGCF 4096 uni50 1.63 19.37 0.398
MovieLens-100K DGCF 32768 full 1.68 0.07 0.2455
MovieLens-100K DGCF 4096 full 1.63 0.2 0.2455
MovieLens-100K SpectralCF 32768 uni100 0.27 1.83 0.2228
MovieLens-100K SpectralCF 4096 uni100 0.35 3.95 0.2228
MovieLens-100K SpectralCF 32768 uni50 0.25 0.97 0.2741
MovieLens-100K SpectralCF 4096 uni50 0.28 2.97 0.2741
MovieLens-100K SpectralCF 32768 full 0.27 0.05 0.1743
MovieLens-100K SpectralCF 4096 full 0.25 0.2 0.1743
MovieLens-1M DGCF 32768 uni100 82.09 776.27 0.3654
MovieLens-1M DGCF 4096 uni100 82.1 1103.69 0.3654
MovieLens-1M DGCF 32768 uni50 81.55 773.25 0.4565
MovieLens-1M DGCF 4096 uni50 82.01 867.66 0.4565
MovieLens-1M DGCF 32768 full 81.77 0.61 0.238
MovieLens-1M DGCF 4096 full 81.8 3.42 0.238
MovieLens-1M SpectralCF 32768 uni100 8.21 80.14 0.3057
MovieLens-1M SpectralCF 4096 uni100 8.1 108.65 0.3057
MovieLens-1M SpectralCF 32768 uni50 8.06 76.84 0.3897
MovieLens-1M SpectralCF 4096 uni50 8.27 87.5 0.3899
MovieLens-1M SpectralCF 32768 full 8.21 0.5 0.1959
MovieLens-1M SpectralCF 4096 full 8.1 3.3 0.1958

Desktop (please complete the following information):

  • OS: Linux
  • RecBole Version: 1.2.0
  • Python Version: 3.10
  • PyTorch Version: 2.1.1
  • cudatoolkit Version: 12.1

@lukas-wegmeth Hi! The longer valid/test time for sampling eval is normal. For full eval, we restore all the user and item embeddings to avoid repeat computations but not for negative sampling evaluation. You can check the source code for details. Note that predict is used for negative sampling evaluations and full_sort_predict is used for full eval.

@BishopLiu Thanks for replying. I have looked at the code and profiled the run time of the functions. I can see where negative sampling evaluation requires more time, but it is still unintuitive to me why it is. I believe negative sampling evaluation should be faster because fewer interactions must be predicted. Also, if restoring the embeddings is much quicker, why is it not done in negative sampling evaluation? Please let me know if I misunderstood anything about this.

@lukas-wegmeth Thank you for your attention to RecBole! The models in RecBole are implemented by different developers. Our first goal is to make sure the model is consistent with the original paper and runs correctly. And different developers have their own considerations. I'm sorry that I cannot answer why restoring embeddings is not done in negative sampling.

@BishopLiu I understand. Thanks for your response. Although my problem with high validation time persists, I can at least verify how this happens in the code now.