Hypercolumns rearrangement into Matrix

Question

Hypercolumns rearrangement into Matrix

SavanGowda opened this issue 7 years ago · comments

I am very much intersted in PixelNet and I am trying to implement the same in MatConvNet. I have an issue regarding what you have mentioned in your paper about the sparse predictions.

According to my understanding, if we have say 1000 sampled pixels we are extracting hypercolumns from the conv1_2 (64 feature maps) , conv2_2 (128 feature maps), conv3_3 (256 feature maps), conv4_3 (512 feature maps) and from conv5_3 (512 feature maps) (if included also from fc7_conv)

Question 1. May be a dumb question, but are we supposed to extract the Ci(p) from all (say conv1_2) 64 feature maps or can we select some random feature maps (say random 32 feature maps)?

In the below snapshot about the sparse prediction, you have mentioned that we have to rearrange the hypercolumn features into a matrix.

Question 2. Are we supposed to rearrange the hypercolumns back to (if the size of the image is 224 x 224) or into a vector with all the hypercolumns (512x1000+ 512x1000 + 256x1000 +128x1000 + 64x1000) concatinated before giving it to the downstream processing.

Thanks in advance. Your reply would be really helpful.

Best Regards
Savan

reynoldscem · Answer 1 · Fri Jun 16 2017 14:40:26 GMT+0800 (China Standard Time)

Rearrange the hypercolumns into a design matrix. number_of_pixels sampled X number of features.

As for which features to take, typically all of them in the selected layers, I expect that you could subsample - but I wouldn't do so randomly.

OranjeeGeneral · Answer 2 · Fri Jun 16 2017 18:11:02 GMT+0800 (China Standard Time)

If I also may ask, do the hypercolumn take the features directly from the convolution layers or after the relu layer of the convolution layer? That isn't really stated clearly in the paper

Savan Rangegowda · Answer 3 · Fri Jun 16 2017 19:43:28 GMT+0800 (China Standard Time)

Hi, @OranjeeGeneral

I guess you have to take it from the convolutional layers as you have also included ReLU in the MLP.

Hi, @reynoldscem

Correct me if I am wrong! But the input for a MLP is supposed to be a vector isn't it?

Aayush Bansal · Answer 4 · Sat Jun 17 2017 00:54:16 GMT+0800 (China Standard Time)

@SavanGowda -

are we supposed to extract the Ci(p) from all (say conv1_2) 64 feature maps or can we select some random feature maps? -- In our work we extracted all the 64 values (say from conv1_2).
hypercolumns are for a pixel, and as such each pixel is represented as a vector. We represent all the subsampled pixels as a matrix. E.g. if there are 1000 pixels, and dimensionality of a vector is k; we represent it using 1000 x k matrix.

Aayush Bansal · Answer 5 · Sat Jun 17 2017 00:56:53 GMT+0800 (China Standard Time)

@OranjeeGeneral - You may extract after or before ReLU. In our experiments, we extract it after ReLU (except while training from scratch). You may look at our train.prototxt file to get an idea of the thing where we extracted before or after ReLU.

OranjeeGeneral · Answer 6 · Mon Jun 19 2017 18:58:11 GMT+0800 (China Standard Time)

@aayushbansai

Thanks for your answer since I will train from scratch I will go with before.:)

Your suggestion looking at the train.prototxt doesn't really help because the way Caffe allows you to do specify inplace layers that basically connects back to the same top blob that has been used as the bottom and it isn't clear from the textual description alone where it lies in the hierarchy, that's why I asked.