b_IJ update

Question

b_IJ update

erlebach opened this issue 7 years ago · comments

Hi,
in CapsLayer.py, consider the current code. One sees that if cfg.iter_routing == 1, that b_IJ never gets updated. Surely that is not the intent? Shouldn't b_IJ be updated at every iteration of the routing? Thanks.

Gordon

if r_iter == cfg.iter_routing - 1:
                # line 5:
                # weighting u_hat with c_IJ, element-wise in the last two dims
                # => [batch_size, 1152, 10, 16, 1]
                s_J = tf.multiply(c_IJ, u_hat)
                # then sum in the second dim, resulting in [batch_size, 1, 10, 16, 1]
                s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
                assert s_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]

                # line 6:
                # squash using Eq.1,
                v_J = squash(s_J)
                assert v_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]
            elif r_iter < cfg.iter_routing - 1:  # Inner iterations, do not apply backpropagation
                s_J = tf.multiply(c_IJ, u_hat_stopped)
                s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
                v_J = squash(s_J)         # <<<<<<<< MISSING UPDATE of B_IJ? 

                # line 7:
                # reshape & tile v_j from [batch_size ,1, 10, 16, 1] to [batch_size, 1152, 10, 16, 1]
                # then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
                # batch_size dim, resulting in [1, 1152, 10, 1, 1]
                v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
                u_produce_v = tf.matmul(u_hat_stopped, v_J_tiled, transpose_a=True)
                assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]

                # b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)
                b_IJ += u_produce_v   # <<<<<< PERHAPS THIS LINE SHOULD BE OUTSIDE THE r_iter LOOP?

Xifeng Guo · Answer 1 · Thu Nov 23 2017 21:56:35 GMT+0800 (China Standard Time)

@erlebach I think this is consistent with the paper. iter_routing=1 should represent not using routing at all. In Figure A.1, they should have done an experiment on iter_routing=1, if otherwise. And they should have reported result with iter_routing=1 in Table 1 to validate the effectiveness of routing algorithm.