关于 RuntimeError: element 0 or 1 of tensors does not require grad and does not have a grad_fn的问题讨论

Question

关于 RuntimeError: element 0 or 1 of tensors does not require grad and does not have a grad_fn的问题讨论

karots123 opened this issue 9 months ago · comments

此问题针对v0.1版本中的pipeline并行的方式
我在使用pipelinemodel的时候，发现出现过两个如标题所属的问题
其中，1的问题比较好解决，正如楼下所说，那就是在forward的过程中不要产生新的叶子tensor。但是在这里的代码里embedding层中就设置了mask之类的变量，然后到模型的其他部分，这个设置为什么不报错？
0的问题主要出现在设置activation-checkpoint的时候，只要在 model_pipe = PipelineModule(layers=get_model(model), num_stages=args.num_stages,partition_method = 'parameters',activation_checkpoint_interval=1)这里的activation_checkpoint_interval变量设置>0,则会报错。分析可能是某个环境的开关没有开，或者说这种get_model函数的构造方式可能有一定的问题？