Running on Reddit dataset is extremely slow

Question

Running on Reddit dataset is extremely slow

cai-lw opened this issue 5 years ago · comments

I downloaded the processed Reddit data set form #8 (comment), and then run train_batch_multiRank_inductive_reddit_Mixlayers_sampleA.py with default parameters. It takes about 10 minutes for a single epoch. However the paper reported 638.6 seconds for the WHOLE training process. I am ~200x slower than your reported speed.

I am running on an AWS m5.2xlarge instance with the same CPU spec as your machine (8 vCPUs = 4 core 8 thread, 2.5GHz). All dependencies are simply installed by pip.

Tengfei Ma · Answer 1 · Thu Apr 04 2019 08:32:05 GMT+0800 (China Standard Time)

The default parameter did not do any sampling: main(None).
Change the "None" into 100 or 200

Liwei Cai · Answer 2 · Thu Apr 04 2019 08:39:26 GMT+0800 (China Standard Time)

@matenure It works. Thank you.
Could you change the default behavior of this code, or tell people how to change it in README? The README says it is "the final model" but it isn't since it didn't do any sampling.

Tengfei Ma · Answer 3 · Mon Apr 08 2019 13:07:20 GMT+0800 (China Standard Time)

Thanks. Your update has been merged.