seq2seq的loss计算问题

Question

seq2seq的loss计算问题

fengxin619 opened this issue 3 years ago · comments

fengxin619 commented 3 years ago

seq2seq_model.py 108行
需要构建特殊的输出mask,屏蔽掉句子a的影响
预测的值不用取最后sep符号的结果因此是到-1
predictions = predictions[:, :-1].contiguous()
target_mask = token_type_id[:, 1:].contiguous()

为什么target_mask是丢掉了[CLS]位，而predictions是丢掉[SEP]位，这在计算Loss的时候不是错位了么？

zhaohu xing · Answer 1 · Fri Jul 02 2021 18:42:05 GMT+0800 (China Standard Time)

没有错位，你再仔细考虑考虑，predictions最后是sep，这个sep对应的输出是没有意义的。

fengxin619 · Answer 2 · Fri Jul 02 2021 18:42:59 GMT+0800 (China Standard Time)

没有错位，你再仔细考虑考虑，predictions最后是sep，这个sep对应的输出是没有意义的。

哇，这么快回复。但是target_mask是丢掉了首位？是[CLS]对应的位置？

zhaohu xing · Answer 3 · Fri Jul 02 2021 18:46:29 GMT+0800 (China Standard Time)

对，如果句子是 [cls, 1, 2, sep, 3, 4, sep] 那么prediction输出则是看[sep, 3, 4] 这几个 token的结果，因此屏蔽掉[cls, 1, 2]，就是利用了target_mask。这个句子对应的token_type_id=[0, 0, 0, 0, 1, 1, 1]，从第二位开始取，就是[0, 0, 0, 1, 1, 1]，prediction输出是[cls, 1, 2, sep, 3, 4]对应的结果，你对应一下，不就刚好把前三个token 的输出屏蔽掉了吗。

fengxin619 · Answer 4 · Fri Jul 02 2021 18:57:42 GMT+0800 (China Standard Time)

0, 0, 0, 0, 1, 1

厉害了大佬...脑筋急转弯我学会了！

fengxin619 · Answer 5 · Fri Jul 02 2021 19:01:14 GMT+0800 (China Standard Time)

对，如果句子是 [cls, 1, 2, sep, 3, 4, sep] 那么prediction输出则是看[sep, 3, 4] 这几个 token的结果，因此屏蔽掉[cls, 1, 2]，就是利用了target_mask。这个句子对应的token_type_id=[0, 0, 0, 0, 1, 1, 1]，从第二位开始取，就是[0, 0, 0, 1, 1, 1]，prediction输出是[cls, 1, 2, sep, 3, 4]对应的结果，你对应一下，不就刚好把前三个token 的输出屏蔽掉了吗。

但是写诗_train.py 127行
target_ids_padded = token_ids_padded[:, 1:].contiguous()
target_id为什么要从第一位开始取呢？

zhaohu xing · Answer 6 · Fri Jul 02 2021 19:03:31 GMT+0800 (China Standard Time)

你觉得应该怎么取呢？

fengxin619 · Answer 7 · Fri Jul 02 2021 19:05:59 GMT+0800 (China Standard Time)

你觉得应该怎么取呢？
target_ids_padded = token_ids_padded[:, :-1].contiguous()
这样？.....求拍醒。

zhaohu xing · Answer 8 · Fri Jul 02 2021 19:13:53 GMT+0800 (China Standard Time)

不对这是目标怎么可能有第一个token？应该是从第二个token开始，一直到最后一个token。

fengxin619 · Answer 9 · Fri Jul 02 2021 19:20:37 GMT+0800 (China Standard Time)

对，如果句子是 [cls, 1, 2, sep, 3, 4, sep] 那么prediction输出则是看[sep, 3, 4] 这几个 token的结果，因此屏蔽掉[cls, 1, 2]，就是利用了target_mask。这个句子对应的token_type_id=[0, 0, 0, 0, 1, 1, 1]，从第二位开始取，就是[0, 0, 0, 1, 1, 1]，prediction输出是[cls, 1, 2, sep, 3, 4]对应的结果，你对应一下，不就刚好把前三个token 的输出屏蔽掉了吗。

像这个例子里面，目标应该是 [cls, 1, 2, sep, 3, 4, sep] ，然后prediction输出是[cls, 1, 2, sep, 3, 4]对应的结果。那目标不应该从第一位开始取么，然后丢掉最后一位

zhaohu xing · Answer 10 · Fri Jul 02 2021 19:22:23 GMT+0800 (China Standard Time)

你再想想。

fengxin619 · Answer 11 · Fri Jul 02 2021 19:30:45 GMT+0800 (China Standard Time)

那我再想想..

seq2seq的loss计算问题

seq2seq_model.py 108行 需要构建特殊的输出mask,屏蔽掉句子a的影响 预测的值不用取最后sep符号的结果 因此是到-1 predictions = predictions[:, :-1].contiguous() target_mask = token_type_id[:, 1:].contiguous()

seq2seq_model.py 108行
需要构建特殊的输出mask,屏蔽掉句子a的影响
预测的值不用取最后sep符号的结果因此是到-1
predictions = predictions[:, :-1].contiguous()
target_mask = token_type_id[:, 1:].contiguous()