microsoft / LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Home Page:https://arxiv.org/abs/2106.09685

Repository from Github https://github.commicrosoft/LoRARepository from Github https://github.commicrosoft/LoRA

After joining Lora, the first few layers show a gradient of 0

hluckye opened this issue · comments

I am a beginner in deep learning and I would like to know if the reason for the gradient to be 0 is due to the vanishing gradient or if my data is too small (batch_size=32)。

I tried to add Lora to a three-layer neural network, but the result was that only the gradients of the Lora_a and Lora_b matrices in the last layer were below 1e-2, while the gradients of the other layers were all 0.
My definition of lora. linear is as follows:

self.prednet_full1_lora = lora.Linear(self.prednet_input_len,self.prednet_len1,r=4)
self.prednet_full2_lora = lora.Linear(self.prednet_len1, self.prednet_len2, r=4)
self.prednet_full3_lora = lora.Linear(self.prednet_len2, 1,r=4)

The forward part of the model is shown below (assuming input_x is the input):

input_x = torch.sigmoid(self.prednet_full1_lora.forward(input_x))
input_x = torch.sigmoid(self.prednet_full2_lora.forward(input_x))
output = torch.sigmoid(self.prednet_full3_lora.forward(input_x))

and I don't forget to write :

loss.backward()
optimizer.step()
net.apply_clipper()

I would greatly appreciate it if you could provide some ideas or solutions