microsoft / ai-edu

AI education materials for Chinese students, teachers and IT professionals.

Home Page:https://microsoft.github.io/ai-edu/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

2022实践空间站问题汇总3

zzzkey23 opened this issue · comments

ebfe7276f8d48715a58076976bab16c

对应GitHub实验链接https://github.com/UEFI-code/MSRA_thePracticeSpaceProject_PyTorchCUDA/wiki/Forward-and-Backward-Design

已知grad_output.data是梯度,input.data是之前forward时备份下来的输入数据,举例当Batchsize = 4, InputDim = 10, 神经元数量为5时,grad_output是[4, 5]的向量,input是[4, 10]的向量,grad_weights是运算结果,请问grad_weights有什么作用?

我实验了一下,即使grad_weights输出全0的Tensor,模型也能够收敛。
实验代码:
https://github.com/UEFI-code/MSRA_thePracticeSpaceProject_PyTorchCUDA/blob/main/Demo_myLinear.py
https://github.com/UEFI-code/MSRA_thePracticeSpaceProject_PyTorchCUDA/blob/main/myKakuritsu_Linear_backend/myKakuritsuCPU.cpp

使用--no-cuda参数运行,就是grad_weights输出全0的