b_IJ update
erlebach opened this issue · comments
Hi,
in CapsLayer.py, consider the current code. One sees that if cfg.iter_routing == 1, that b_IJ never gets updated. Surely that is not the intent? Shouldn't b_IJ be updated at every iteration of the routing? Thanks.
Gordon
if r_iter == cfg.iter_routing - 1:
# line 5:
# weighting u_hat with c_IJ, element-wise in the last two dims
# => [batch_size, 1152, 10, 16, 1]
s_J = tf.multiply(c_IJ, u_hat)
# then sum in the second dim, resulting in [batch_size, 1, 10, 16, 1]
s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
assert s_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]
# line 6:
# squash using Eq.1,
v_J = squash(s_J)
assert v_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]
elif r_iter < cfg.iter_routing - 1: # Inner iterations, do not apply backpropagation
s_J = tf.multiply(c_IJ, u_hat_stopped)
s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
v_J = squash(s_J) # <<<<<<<< MISSING UPDATE of B_IJ?
# line 7:
# reshape & tile v_j from [batch_size ,1, 10, 16, 1] to [batch_size, 1152, 10, 16, 1]
# then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
# batch_size dim, resulting in [1, 1152, 10, 1, 1]
v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
u_produce_v = tf.matmul(u_hat_stopped, v_J_tiled, transpose_a=True)
assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]
# b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)
b_IJ += u_produce_v # <<<<<< PERHAPS THIS LINE SHOULD BE OUTSIDE THE r_iter LOOP?
@erlebach I think this is consistent with the paper. iter_routing=1
should represent not using routing at all. In Figure A.1, they should have done an experiment on iter_routing=1
, if otherwise. And they should have reported result with iter_routing=1
in Table 1 to validate the effectiveness of routing algorithm.