After joining Lora, the first few layers show a gradient of 0

Question

After joining Lora, the first few layers show a gradient of 0

hluckye opened this issue a year ago · comments

I am a beginner in deep learning and I would like to know if the reason for the gradient to be 0 is due to the vanishing gradient or if my data is too small (batch_size=32)。

I tried to add Lora to a three-layer neural network, but the result was that only the gradients of the Lora_a and Lora_b matrices in the last layer were below 1e-2, while the gradients of the other layers were all 0.
My definition of lora. linear is as follows:

self.prednet_full1_lora = lora.Linear(self.prednet_input_len,self.prednet_len1,r=4)
self.prednet_full2_lora = lora.Linear(self.prednet_len1, self.prednet_len2, r=4)
self.prednet_full3_lora = lora.Linear(self.prednet_len2, 1,r=4)

The forward part of the model is shown below (assuming input_x is the input):

input_x = torch.sigmoid(self.prednet_full1_lora.forward(input_x))
input_x = torch.sigmoid(self.prednet_full2_lora.forward(input_x))
output = torch.sigmoid(self.prednet_full3_lora.forward(input_x))

and I don't forget to write ：

loss.backward()
optimizer.step()
net.apply_clipper()

I would greatly appreciate it if you could provide some ideas or solutions