ThyrixYang / es_dfm

Implementation and experimental comparison of ES-DFM (Yang et al. 2021), Delayed feedback model(DFM, Chapelle 2014), Feedback Shift Importance Weighting (FSIW) (Yasui et al. 2020), Fake Negative Weighted (FNW) (Ktena et al. 2019) and Fake Negative calibration(FNC) (Ktena et al. 2019)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A tiny bug in data processing

hwlza opened this issue · comments

commented

hi, Thanks for your work and contributing it to the community. :) I found a typo bug when preparing the criteo dataset as follows:

"sample_ts": train_data.sample_ts,

I think the "sample_ts": train_data.sample_ts should be "sample_ts": test_data.sample_ts although this bug would not take effect on the pre-training results, only when one wants to make some evaluations in the pre-training stage.

commented

Hi @hwlza ,

We appreciate your interest in our work, and we are grateful for your observation regarding this minor typo. We have rectified this typo based on your suggestion.
We hope our code can be useful for your work. :)

commented

Also, I think the following code snippets that aim to serialize the well-processed data miss one level of indentation, see

es_dfm/src/data.py

Lines 413 to 415 in 8487ed4

if params["data_cache_path"] != "None":
with open(cache_path, "wb") as f:
pickle.dump({"train": train_data, "test": test_data}, f)

and

es_dfm/src/data.py

Lines 340 to 343 in 8487ed4

if params["data_cache_path"] != "None":
with open(cache_path, "wb") as f:
pickle.dump({"train": train_stream, "test": test_stream}, f)
return train_stream, test_stream

Since in the current version, the file would be re-write again even if the well-processed data had been cached. I think this will lead to an unnecessary overload especially when the volume of the raw dataset is huge, though has nothing else impact on the final result.

commented

Hi, @hwlza

Fixed. Thank you! I think this bug is introduced in the open-source version.

commented

Have a nice day. Thanks for your excellent work. 😄