Bobo-y / flexible-yolov5

More readable and flexible yolov5 with more backbone(gcn, resnet, shufflenet, moblienet, efficientnet, hrnet, swin-transformer, etc) and (cbam,dcn and so on), and tensorrt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why is the Swin-Transformer model only outputting 2nd features onwards?

sarmientoj24 opened this issue · comments

I saw this and I was wondering why it is not outputting all features.

Is there a way to use all four?

commented

@sarmientoj24 outs include [ /4, /8, / 16, /32], [1:] means reutrn 2nd to last

so it only outputs two feature pyramids, is that what it is?

when i tried it it outputs this

swin = swin_transformer(version='tiny')
_ = swin.to('cuda')
swin_x = swin(sample)
swin_x[0].shape, swin_x[1].shape, swin_x[2].shape

(torch.Size([1, 192, 80, 80]),
 torch.Size([1, 384, 40, 40]),
 torch.Size([1, 768, 20, 20]))

commented

when i tried it it outputs this

swin = swin_transformer(version='tiny')
_ = swin.to('cuda')
swin_x = swin(sample)
swin_x[0].shape, swin_x[1].shape, swin_x[2].shape

(torch.Size([1, 192, 80, 80]),
 torch.Size([1, 384, 40, 40]),
 torch.Size([1, 768, 20, 20]))

what !!!!! that a big bug, i will check it

also, is it possible to use all of the feature pyramids that it is outputting for the neck and YOLO heads?

commented

also, is it possible to use all of the feature pyramids that it is outputting for the neck and YOLO heads?

of course

commented

also, is it possible to use all of the feature pyramids that it is outputting for the neck and YOLO heads?

of course

#123 (comment)

commented

when i tried it it outputs this

swin = swin_transformer(version='tiny')
_ = swin.to('cuda')
swin_x = swin(sample)
swin_x[0].shape, swin_x[1].shape, swin_x[2].shape

(torch.Size([1, 192, 80, 80]),
 torch.Size([1, 384, 40, 40]),
 torch.Size([1, 768, 20, 20]))

what !!!!! that a big bug, i will check it

outs of. swin
`
len: 4

torch.Size([24, 96, 160, 160])

torch.Size([24, 192, 80, 80])

torch.Size([24, 384, 40, 40])

torch.Size([24, 768, 20, 20])
`

there is no bug, the origin outs is a list with four elements, [/4, /8, /16, /32], i only use 2nd to last

I think that is true. That there are 4 elements there. But this

 return outs[1:] 

just returns three which are these

torch.Size([24, 192, 80, 80])

torch.Size([24, 384, 40, 40])

torch.Size([24, 768, 20, 20])

My follow up question is why not output all of them on a forward pass? Since torch.Size([24, 96, 160, 160]) is a large feature map. From what I know, YOLOv5 has a P6 version where it takes four feature maps. Is that for that?

commented

I think that is true. That there are 4 elements there. But this

 return outs[1:] 

just returns three which are these

torch.Size([24, 192, 80, 80])

torch.Size([24, 384, 40, 40])

torch.Size([24, 768, 20, 20])

My follow up question is why not output all of them on a forward pass? Since torch.Size([24, 96, 160, 160]) is a large feature map. From what I know, YOLOv5 has a P6 version where it takes four feature maps. Is that for that?

in this repo, only use three detection layers, so i return three. yes, you can think 160, 160 is P2. P6 need four, and p7 need five. You need to implement it yourself, you can refer this #126 (comment)