dynamic routing

Question

dynamic routing

InnerPeace-Wu opened this issue 7 years ago · comments

thx for the amazing work you've done. Since i adapted dynamic routing from your code, and I wanna share some of my ideas about it. Here is my repo with tentorflow
bias updating
you mentioned that you fix bias to 0, but during dynamic routing you are updating it, is that so? code: here and here.
In my opinion, the bias should not be updated, since it's just the initial value for dynamic routing, with your implementation, you will update bias every time you send in some data, even with Variable be set as trainable=False, and of course, the same thing goes for testing procedure. I think the easiest way is make a temporal variable with temp_bias = bias, and use it for dynamic routing.
bias summing
code here, it seems that you are trying to keep the shape of bias as [num_caps, 10], and you sum over all the training examples. I think that's problematic. The paper mentioned that bias is independent from image, but during routing, capsule prediction from layer below varies for different image, so the updated bias should be different too. After bias updated, the shape of bias should be [batch_size, caps, 10].

I tried with 3 iterations of dynamic routing, after less than 4 epoch (2k iters) the validation accuracy is 99.16, it seems working. Still not as efficient as the paper said.
But i got a huge problem that training procedure is slow, with almost 2s per iteration with batch_size 100 in Nvidia 1060, which way more than yours.

Just some of my ideas, glad to discuss with you.
best.

Kenta Iwasaki · Answer 1 · Wed Nov 01 2017 20:46:45 GMT+0800 (China Standard Time)

I agree with what you say for bias updates, though I am not so sure for bias summations.

As you said, the bias is independent from the image. If we keep it independent from the image, then wherever an image is in the index of a batch shouldn't affect the route it takes.

If the bias is in the form of [batch_size, caps, 10], wherever the image is routed is dependent on its batch index.

The reason as to why I believe [num_caps, 10] is correct is because the image after convolution could simply be global average/max-pooled, and thus no image-specific features relates to the bias during the dynamic routing process.

The paper itself states that 10 for MNIST's case is the number of capsules in DigitCaps, and 32 * 6 * 6 represents 6 * 6 capsules, and 32 channels of these 6 * 6 capsules. Hence, the architecture itself considers the image-space as a set of capsules.

WillHunting · Answer 2 · Wed Nov 01 2017 21:33:15 GMT+0800 (China Standard Time)

Discussion clears things. Thx for sharing your idea @iwasaki-kenta.
My point is i make the log prior be same for every image before dynamic routing. Referring to the algorithm.

before routint $b_{ij}$ is initially set to 0, but for every image the $$\hat{u_{j|i}}$$ varies. So in my opinion, during routing the bias differ for different image with different cap predictions. But the initial $$b_{ij}$$ is still the same for every image, this is why we should not update THE bias during routing, but to take use of the VALUE of bias for building up agreement with layer after.

Imagine that, firstly you seed in a image with number 5, you get the digit caps. Then you send in a image 8. Did the bias during routing differ from each other?

In a word, I think the initial log prior and the bias during dynamic routing is not the same thing. The later just take use of the VALUE of the initial log prior for routing.

best,

Kenta Iwasaki · Answer 3 · Wed Nov 01 2017 22:51:49 GMT+0800 (China Standard Time)

Alright, I completely understand and agree with what you mean :).

As we go through r iterations, we are having each image in a batch go through the routing process. The initial log likelihoods representing the biases have no relation to the images being routed, though we should keep track rather than sum the batch dimension of each image in the batch. Otherwise, we lose information on where an image is being routed by summing/averaging their posteriors.

I reflected the change in a version of the model I'm working on right now. Thanks a lot for the clarification.

Xifeng Guo · Answer 4 · Thu Nov 02 2017 17:28:04 GMT+0800 (China Standard Time)

Thanks for this discussion, I have fixed this issue. Please check the newest commit. @InnerPeace-Wu @iwasaki-kenta