Advice for tuning the hyper-parameters of the initial model

Question

Advice for tuning the hyper-parameters of the initial model

hwlza opened this issue a year ago · comments

Hi, Jia-Qi

I tried to reproduce your experiments on the criteo dataset, mainly comparing your proposed ESDFM approach against the Pre-trained baseline. However, I found there exists some gap between the results and the values you reported in Table 2. Do you have any advice for tuning the hyper-parameters of the initial model?

I attach my running scripts & stats. here for reference.
(1) First, I pre-trained a standard MLP model for CTR prediction using by executing the following command:

python ./src/main.py --method Pretrain \
                    --mode pretrain \
                    --model_ckpt_path ./work_dir/ckpts/pretrain/baseline/baseline \
                    --data_path ./data/criteo/data.txt \
                    --data_cache ./work_dir/data_cache

(2) Then, I trained the importance weighter classifiers as follows:

python ./src/main.py --method ES-DFM \
                    --mode pretrain \
                    --model_ckpt_path ./work_dir/ckpts/pretrain/esdfm/esdfm \
                    --data_path ./data/criteo/data.txt \
                    --data_cache ./work_dir/data_cache

(3) Finally, we can train the ESDFM model on the simulated stream data as:

python ./src/main.py --method ES-DFM \
                    --mode stream \
                    --pretrain_baseline_model_ckpt_path ./work_dir/ckpts/pretrain/baseline/baseline  \
                    --pretrain_esdfm_model_ckpt_path ./work_dir/ckpts/pretrain/esdfm/esdfm \
                    --data_path ./data/criteo/data.txt \
                    --data_cache ./work_dir/data_cache

After the step-3, I found the values of the metrics are: AUC(0.838215320158801)|PR-AUC(0.637005309537314)|NLL(0.394365247095086), which show some differences to the values you reported in Table 2 of the paper. Since the seed that controls the weight initialization would affect the final results a bit, I ran steps 1&2 above using another seed and ran 4 combinations of that in step 3. But the metric differences still exist.

Thank you.

Thyrix · Answer 1 · Mon Sep 04 2023 17:57:04 GMT+0800 (China Standard Time)

Hi @hwlza ,

We did not fix the random seed during training of the initial model, and I think this difference is mainly caused by randomness.
Nevertheless, such difference will not affect comparison between delayed feedback methods, since their performance will shift to higher or lower simultaneously. The conclusions will not be affected.
You can just use the pretrained checkpoints to achieve the reported performance in the paper if necessary.

hwlza · Answer 2 · Mon Sep 04 2023 18:12:53 GMT+0800 (China Standard Time)

My results under different seeds vary less than 2e-4, but the difference caused by different initial models varies about 2e-3. Any chance to share the hyper-parameters setting of your initial model?

Thyrix · Answer 3 · Mon Sep 04 2023 18:25:23 GMT+0800 (China Standard Time)

My results under different seeds vary less than 2e-4, but the difference caused by different initial models varies about 2e-3. Any chance to share the hyper-parameters setting of your initial model?

I think you can try to tune the learning rate since it's the only tunable hyper-parameter. I'm not quite sure about the exact value we used three years ago.

hwlza · Answer 4 · Mon Sep 04 2023 19:01:20 GMT+0800 (China Standard Time)

Okay, will try it. Thx