bernard24 / RIS

Implementation of the approach described in the paper "Recurrent Instance Segmentation" https://arxiv.org/abs/1511.08250.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

puzzles about the code and training process

brisker opened this issue · comments

commented

How to perform leaf segmentation on the cvppp dataset after I finished training?(part of the training log:
image

Besides, if I directly use the weird output of the fcn, there is error in this line:
lst = protos.rnn:forward{x, unpack(rnn_state[t-1])}
the error reads like this:
...~torch/install/share/lua/5.1/nn/CAddTable.lua:16: bad argument #2 to 'add' (sizes do not match at /tmp/luarocks_cutorch-scm-1-4319/cutorch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:198)
stack traceback:
[C]: in function 'add'
........

The inference code I use is here
leaf_seg_infer.txt

Could you please give some advice? Or give a demo for inference the leaf segmentation?

Thanks a lot!

Yes, plants_pre_lstm.model contains the learned weights of the FCN. Having a quick look at your code, there is no need for permutation.
I have uploaded an example for inference in the plants_learning directory, hope it helps.

About the title, let me know if you have issues with RIS_infer.ipynb.

commented

@bernard24
Help! I used the code you provided for leaf segmentation, but the result seems unreasonable:
in function createInstance, why do you resize the input image with size 530 * 500 to 106 * 100 since the input is fed into fcn ? And I found that if I don't do the resize, the codes get bugs. I can not understand.
image

image
image

Besides, I have the following puzzles:
Firstly, for multi-person segmentation , why there exist fcn_8_1 and fcn_8_2 , two fcn models, not just one as in the leaf segmentation?
image

Secondly, why don't you use fcn8s for leaf segmentation?

Thirdly, why the segmentation result leaves' size seems bigger than the output of the post_lstm ? after crf postprocessing?
image

Fourthly, in the paper you mentioned the leaf segmentation fcn have 5 conv layers with the first one 9 * 9 size, so is the 9 * 9 size conv layer defined here?
image

so why do you define this layer in experiment.lua, but not in create_cnn.lua? (because I can only see 4 convolutional layers in create_cnn.lua for plant_training)

Fifthly,
In the inference code the output of protos.post_lsm seems to have two elements:(map and score)
image

but I can not find the corresponding definition of the two elements here: Could you please point out?
image

Thanks a lot ! ! !
@bernard24

In this version of the experiment, the fcn is created with create_cnn instead of create_big_cnn, so the input is downsampled to that size. Using create_big_cnn assumes that the input is the original 530x500. In practice, results are really similar.

Regarding your puzzles:
1.- For people segmentation we have fcn_8_1 --> fcn_8_2 --> convlstm. The reason I divided the fcn_8 into 2 parts is because we finetune on fcn_8_1, whereas we learned from scratch fcn8_2 (which is a layer that interfaces between fcn8_1 and convlstm). It basically made experiments easier to carry out.

2.- There were no previous CNNs trained to detect leaves. Because of that I created a CNN and learned it from scratch. Finetuning the fcn8 to segment leaves might be a bit of an overkill.

3.- That is just for illustration purposes (I cannot make the output of the lstm bigger in the figure). There is no difference in size before and after the CRF.

4.- Yes.

5.- outputs is a table which is created empty in line 18. Then either in line 33 or 37 (depending on the condition) the mask is added to this table. Finally, the score is added in line 50. The actual output of this gModule includes the current state h_t, and these two outputs.

commented

@bernard24
Thanks a lot for your reply!
But why the leaf segmentation result using the code you provided (I have shown the screenshot in the previous reply) seems unreasonable, it seems to be repeating the first two outputs . Is there any bug in the codes? I can not find any obvious bugs but the results must got some errors! Could you please give some advice? Thanks a lot!

Besides,
1-- I found that if I set the 'learn_pre' to 1, the codes generate a new model called plants_pre_cnn.model, so what is the difference between this and plants_pre_lstm.model?
image

2-- Still I can not understand why do you write these lines of codes:
image
This is also where you defined the pre_cnn.model called 'premodel'.
What is the meaning of writing those lines of codes above ? Why don't you do everything related to CNN in create_cnn.lua?
and
What is the meaning of the existence of the variable called "is_big"?
Thanks a lot !
@bernard24

About your first question, I guess you are running ./launcher.sh, which is
th experiment.lua -learning_rate 0.0001 -seq_length 2
in which the model learns to segment any 2 leaves (-seq_length) from the images. Looking at the image you uploaded, apparently it does segment 2 leaves quite decently, although it does not generalize beyond that number, which is usually the case.
The reason for this is that curriculum learning helps a lot in training this kind of models, so it helps to make the model learn to first segment 2 instances, then 4, 8, and so on. At each stage you load the previous model (using the flags -cnn_model and -lstm_model).

About learn_pre, it makes reference to the first convolution that downsamples the input image to 106 x 100. It is separated from plants_pre_lstm.model because I tried both options (with and without, the variable is_big makes reference to one or the other mode) and it seemed convenient experimentally, but you are right, conceptually it should belong to the same thing.

commented

So why not set seq_length directly to 16 ,rather than step by step (2-->4-->8-->16)? Why did you use curriculum learning here?

Besides,

image

image

1---Is this blue number's (showing above) approaching -1 indicating "good" segmentation ?
If not, what value indicates "good" segmentation? As long as it is small , even a negative number?

2--- Must the lstm' seq_length larger than the maximum number of instances in each image at training time ? How about inference time?

3.--- seq_length=16, the blue number converges at 0.2, segmentation result is bad

4---In your experiments for leaf segmentation, when your training stops, what is your final parameter settings and training loss? seq_length? the "blue number" (training loss)?

5--- What information does hist.t7 file save and what is its meaning?
image

Thanks a lot !
@bernard24

In general I have found that curriculum learning gets you to good models in a shorter time, particularly so starting with seq_length=2, not that important the following steps. Learning everything without curriculum learning is also possible, but takes longer.

1- The blue number is the current training loss, calculated according to eq (5) in the paper. In that equation, the first term is less or equal than 0, and the second is bigger or equal than 0. The closer to -1 the resultant loss is, the better the model fits the training data.

2- Yes, that definitely helps. Note also that by default, non_object_iterations=1, which corresponds to the number of iterations beyond the number of ground truth instances that appear in the image, that the lstm runs. That is useful to allow the network to learn when to stop segmenting (indicated by a low confidence score in that iteration).
At test time, you should run the lstm until the confidence score drops below 0.5.

3- I suspect this is due to a high value of the learning rate. Try dividing it by 10.

4- At the moment I am away and do not have access to this. I will update this when I am back.

5- It contains just the training loss (the blue number) across time.

commented

@bernard24

1--- What if there exists 2,3 or even more instances in a particular time step of the lstm's sequence ? How did you handle this kind of circumstance?

2---I tried to use the Upsample layer you defined to upsample the output to the original input image size ,I modify the code in create_lstm_protos.lua like this:
image

,but I got the following error:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
~code/torch/install/bin/luajit: ~code/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #2 to 'v' (3D or 4D (batch mode) tensor is expected at /tmp/luarocks_cunn-scm-1-143/cunn/lib/THCUNN/SpatialConvolutionMM.cu:12)
stack traceback:
[C]: in function 'v'
~/code/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
...de/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:96: in function 'updateOutput'
./Upsample.lua:71: in function 'updateGradInput'
.../code/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'neteval'
.../code/torch/install/share/lua/5.1/nngraph/gmodule.lua:454: in function <...jtu/code/torch/install/share/lua/5.1/nngraph/gmodule.lua:390>
[C]: in function 'xpcall'
.../torch/install/share/lua/5.1/nngraph/graphinspecting.lua:35: in function 'updateGradInput'
~code/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
experiment.lua:324: in function 'opfunc'
~/code/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'optimMethod'
experiment.lua:384: in main chunk
[C]: in function 'dofile'
...code/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
stack traceback:
[C]: in function 'error'
.../torch/install/share/lua/5.1/nngraph/graphinspecting.lua:43: in function 'updateGradInput'
~/code/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
experiment.lua:324: in function 'opfunc'
~/code/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'optimMethod'
experiment.lua:384: in main chunk
[C]: in function 'dofile'
...code/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Why don't you use SpatialUpSamplingBilinear layer? What is wrong with the upsample layer? It seems that the input to this layer is not 3*D tensor , but how could that happen? The input is here in create_lstm_protos.lua
image

1.- The model learns to segment one and only one instance per iteration. It is a model failure whenever the model produces a mask covering more than one instance.

2.- That is due to the loss function reshaping the gradients, sorry about it. A quick solution would be to change the third line of your code with
table.insert(outputs, nn.Reshape(height_width_5^2)(upsample))

The only reason we are not using SpatialUpSamplingBilinear here is because it was not available at the time we started working on this. I would imagine that this layer should work as well here.
If you do not mind, I am going to close this issue as it is becoming a bit unfocused. It may be better to open new issues with one specific question, so that they become more useful for everyone else.