Advice for tuning the hyper-parameters of the initial model
hwlza opened this issue · comments
Hi, Jia-Qi
I tried to reproduce your experiments on the criteo dataset, mainly comparing your proposed ESDFM approach against the Pre-trained baseline. However, I found there exists some gap between the results and the values you reported in Table 2. Do you have any advice for tuning the hyper-parameters of the initial model?
I attach my running scripts & stats. here for reference.
(1) First, I pre-trained a standard MLP model for CTR prediction using by executing the following command:
python ./src/main.py --method Pretrain \
--mode pretrain \
--model_ckpt_path ./work_dir/ckpts/pretrain/baseline/baseline \
--data_path ./data/criteo/data.txt \
--data_cache ./work_dir/data_cache
(2) Then, I trained the importance weighter classifiers as follows:
python ./src/main.py --method ES-DFM \
--mode pretrain \
--model_ckpt_path ./work_dir/ckpts/pretrain/esdfm/esdfm \
--data_path ./data/criteo/data.txt \
--data_cache ./work_dir/data_cache
(3) Finally, we can train the ESDFM model on the simulated stream data as:
python ./src/main.py --method ES-DFM \
--mode stream \
--pretrain_baseline_model_ckpt_path ./work_dir/ckpts/pretrain/baseline/baseline \
--pretrain_esdfm_model_ckpt_path ./work_dir/ckpts/pretrain/esdfm/esdfm \
--data_path ./data/criteo/data.txt \
--data_cache ./work_dir/data_cache
After the step-3, I found the values of the metrics are: AUC(0.838215320158801)|PR-AUC(0.637005309537314)|NLL(0.394365247095086)
, which show some differences to the values you reported in Table 2 of the paper. Since the seed that controls the weight initialization would affect the final results a bit, I ran steps 1&2 above using another seed and ran 4 combinations of that in step 3. But the metric differences still exist.
Thank you.
Hi @hwlza ,
We did not fix the random seed during training of the initial model, and I think this difference is mainly caused by randomness.
Nevertheless, such difference will not affect comparison between delayed feedback methods, since their performance will shift to higher or lower simultaneously. The conclusions will not be affected.
You can just use the pretrained checkpoints to achieve the reported performance in the paper if necessary.
My results under different seeds vary less than 2e-4, but the difference caused by different initial models varies about 2e-3. Any chance to share the hyper-parameters setting of your initial model?
My results under different seeds vary less than 2e-4, but the difference caused by different initial models varies about 2e-3. Any chance to share the hyper-parameters setting of your initial model?
I think you can try to tune the learning rate since it's the only tunable hyper-parameter. I'm not quite sure about the exact value we used three years ago.
Okay, will try it. Thx