[CLS] similar context vector on Evaluation
Sami-Sh99 opened this issue · comments
Sami Shames El Deen commented
While training my model on the Arabic language, I was logging some of the values that were getting processed and generated by the model. the following is a sample log that I was getting while training the model:
top_vec: tensor([[[-0.2439, 0.2242, 1.3744, ..., 1.2180, -1.4410, -1.3635],
[-0.2523, 0.1137, 1.3378, ..., 1.2184, -0.1754, -1.2815],
[-0.4105, 0.0702, 1.4091, ..., 1.2221, -1.5671, -1.3778],
...,
[ 0.0288, -0.6760, 1.5258, ..., 1.3763, -1.4011, -1.3328],
[-0.0218, -0.3249, 1.1765, ..., 1.4232, -1.2773, -1.1683],
[ 0.0678, 0.2823, 1.2759, ..., 1.2741, 0.0080, -1.0290]]],
device='cuda:0', grad_fn=<NativeLayerNormBackward>) torch.Size([1, 432, 768])
clss: tensor([[ 0, 31, 73, 90, 104, 142, 169, 187, 199, 213, 236, 273, 297, 315,
337, 351, 364, 382, 415]], device='cuda:0') torch.Size([1, 19])
sents_vec: tensor([[[-0.2439, 0.2242, 1.3744, ..., 1.2180, -1.4410, -1.3635],
[-0.2009, -0.0098, 0.3056, ..., 1.2681, -1.3180, -1.2614],
[-0.2254, -0.0302, 0.2825, ..., 1.3459, -0.9250, -1.1691],
...,
[-0.2042, -0.1110, 1.3395, ..., 1.2766, -1.2633, -1.1890],
[-0.1571, -0.6477, 1.2429, ..., 0.6955, -0.8612, -1.1577],
[-0.2982, -0.9736, 1.2249, ..., 1.3346, -1.3179, -1.0534]]],
device='cuda:0', grad_fn=<MulBackward0>) torch.Size([1, 19, 768])
sent_scores: tensor([[0.2587, 0.1031, 0.2036, 0.0026, 0.2685, 0.0003, 0.0006, 0.0015, 0.0039,
0.0027, 0.0164, 0.0015, 0.0077, 0.0006, 0.0005, 0.0009, 0.0770, 0.0069,
0.0009]], device='cuda:0', grad_fn=<SqueezeBackward1>) torch.Size([1, 19])
[2022-02-25 00:37:03,025 INFO] Step 2155/50000; xent: 0.39; lr: 0.0000500; 12 docs/s; 280 sec
Everything seemed to be going well until I executed the train.py
with testing mode, all the [CLS]
tokens were generating the exact same value:
top_vec: tensor([[[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
...,
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151]]],
device='cuda:0') torch.Size([1, 512, 768])
clss: tensor([[ 0, 38, 51, 79, 130, 150, 171, 213, 258, 271, 304, 326, 345, 362,
378, 395, 413, 449, 471, 492]], device='cuda:0') torch.Size([1, 20])
sents_vec: tensor([[[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
...,
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151]]],
device='cuda:0') torch.Size([1, 20, 768])
sent_scores: tensor([[0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
0.0567, 0.0567]], device='cuda:0') torch.Size([1, 20])
top_vec: tensor([[[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
...,
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151]]],
device='cuda:0') torch.Size([1, 512, 768])
clss: tensor([[ 0, 43, 92, 127, 151, 172, 191, 226, 242, 256, 269, 290, 312, 330,
365, 410, 433, 461, 482, 508]], device='cuda:0') torch.Size([1, 20])
sents_vec: tensor([[[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
...,
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151],
[ 0.0841, 0.3211, -0.1155, ..., 0.5341, -0.0099, -0.0151]]],
device='cuda:0') torch.Size([1, 20, 768])
sent_scores: tensor([[0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567, 0.0567,
0.0567, 0.0567]], device='cuda:0') torch.Size([1, 20])
can anyone please help and indicate why such problem is occurring with me.