mit-han-lab / hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Home Page:https://hat.mit.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the SubTransformers sampling process.

Kevinpsk opened this issue · comments

Hi,

Thanks a lot for releasing this great project.
I have a question on the SubTransformers sampling process in the distributed training environment. I see you sample a random SubTransformer before each train step by doing the following, then in multi-GPU scenario, does each GPU has the same random SubTransformer or they each has a different random Subnetwork? Would reset_rand_seed force all GPUs to sample the same random SubTransformer from the SuperNet? And is trainer.get_num_updates() the same at each train step?

configs = [utils.sample_configs(utils.get_all_choices(args), reset_rand_seed=True, rand_seed=trainer.get_num_updates(), super_decoder_num_layer=args.decoder_layers)]

Thanks a lot for your help.