A tiny bug in data processing
hwlza opened this issue · comments
hi, Thanks for your work and contributing it to the community. :) I found a typo bug when preparing the criteo dataset as follows:
Line 428 in 9183c11
I think the "sample_ts": train_data.sample_ts
should be "sample_ts": test_data.sample_ts
although this bug would not take effect on the pre-training results, only when one wants to make some evaluations in the pre-training stage.
Hi @hwlza ,
We appreciate your interest in our work, and we are grateful for your observation regarding this minor typo. We have rectified this typo based on your suggestion.
We hope our code can be useful for your work. :)
Also, I think the following code snippets that aim to serialize the well-processed data miss one level of indentation, see
Lines 413 to 415 in 8487ed4
and
Lines 340 to 343 in 8487ed4
Since in the current version, the file would be re-write again even if the well-processed data had been cached. I think this will lead to an unnecessary overload especially when the volume of the raw dataset is huge, though has nothing else impact on the final result.
Hi, @hwlza
Fixed. Thank you! I think this bug is introduced in the open-source version.
Have a nice day. Thanks for your excellent work. 😄