why average b_ij a cross example?
jingjing-gong opened this issue · comments
https://github.com/naturomics/CapsNet-Tensorflow/blob/master/capsLayer.py#L151
# then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
# batch_size dim, resulting in [1, 1152, 10, 1, 1]
v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
u_produce_v = tf.matmul(u_hat, v_J_tiled, transpose_a=True)
assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]
b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)
Why would you need to average b across batch dimension? I don't see why would that be good, since that would make the model batch-size dependent. If there is any mention on this in the paper or other source, can you point out where and send a link, appreciated.
I was asking the same question in #21, but failed to formulate it properly at first.
@JerrikEph It's the same question in #21, please follow the result in that Issue