关于dataset模块的encode方法，可能的bug

Question

关于dataset模块的encode方法，可能的bug

weiaicunzai opened this issue 6 years ago · comments

大佬你好：
关于你的encode代码，我有一个疑问：

        for i in range(cxcy.size()[0]):
            cxcy_sample = cxcy[i]
            ij = (cxcy_sample/cell_size).ceil()-1 #
            target[int(ij[1]),int(ij[0]),4] = 1
            target[int(ij[1]),int(ij[0]),9] = 1
            target[int(ij[1]),int(ij[0]),int(labels[i])+9] = 1
            xy = ij*cell_size #匹配到的网格的左上角相对坐标
            delta_xy = (cxcy_sample -xy)/cell_size
            target[int(ij[1]),int(ij[0]),2:4] = wh[i]
            target[int(ij[1]),int(ij[0]),:2] = delta_xy
            target[int(ij[1]),int(ij[0]),7:9] = wh[i]
            target[int(ij[1]),int(ij[0]),5:7] = delta_xy

关于类的概率部分赋值倒没什么疑问，但是bbox的赋值我有一些疑问，希望大佬能够解答。

考虑如下的bbox的label:
也是一个7 * 7 * 30的target

x1, y1 , w1 , h1 , c1, x2, y2, w2, h2, c2 是target[ row, col, :10]的值，
target[row, col, 10:] 是class 概率，

x1, y1代表bbox中心点坐标，w1,h1代表bbox的宽和长，c1是论文中的confidence score，x2等就是第二个bbox的label。以此类推。

我看到大佬把第一个bbox的label和第二个bbox的label全都赋值为一样的了，两个confident score也一样了

            target[int(ij[1]),int(ij[0]),2:4] = wh[i]
            target[int(ij[1]),int(ij[0]),:2] = delta_xy
            target[int(ij[1]),int(ij[0]),7:9] = wh[i]
            target[int(ij[1]),int(ij[0]),5:7] = delta_xy

我的问题是：

假如有两个物体，他们两个物体的bbox的中心都落到同一个cell里，这个时候为啥还要把两个bbox的label和confident score赋值为一样的呢？？？不应该是一个bbox的label对应x1,y1,w1,h1,另外一个物体的bbox赋值到x2,y2,w2,h2吗？
另外为什么一个cell里只有一个bbox的中心的时候，要c1和c2都是1，x1=x2, y1=y2, w1=w2, h1=h2呢？

十分感谢！

bear · Answer 1 · Thu Jul 26 2018 10:03:16 GMT+0800 (China Standard Time)

个人理解:
首先看v2之后是如何解决的,v2的anchor有宽高,每个box根据目标与anchor的iou独立匹配.
但是在v1中,没有办法决定哪一个box负责这个目标,只能在训练时计算pred_box与目标的iou决定哪一个,所以两个box_target赋值相同,在训练的时候,一个是有效的,一个是无效的(not responsible)
我在实现时,默认一个cell中只会出现一个目标,多余一个的情况不能很好的解决.

ronghuaiyang · Answer 2 · Thu Jul 26 2018 10:33:24 GMT+0800 (China Standard Time)

@xiongzihua 如果训练图像的gt中有一个cell对应多个box，那么在encode的时候，最终只有一个标注的box会被写到target中，是这样吗？

bear · Answer 3 · Thu Jul 26 2018 10:42:57 GMT+0800 (China Standard Time)

@ronghuaiyang 是这样的,最终只会有一个.

weiaicunzai · Answer 4 · Thu Jul 26 2018 12:21:15 GMT+0800 (China Standard Time)

好的谢谢回复了。

我昨天晚上看了一些博客，发现yolov1是一个cell同时会预测两个框。但是这两个框都是预测的同一个物体，也就是说，yolov1对于密集的物体预测能力有限，一张图像最多能预测49个不同的物体。