why average b_ij a cross example?

Question

why average b_ij a cross example?

jingjing-gong opened this issue 7 years ago · comments

https://github.com/naturomics/CapsNet-Tensorflow/blob/master/capsLayer.py#L151

            # then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
            # batch_size dim, resulting in [1, 1152, 10, 1, 1]
            v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
            u_produce_v = tf.matmul(u_hat, v_J_tiled, transpose_a=True)
            assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]
            b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)

Why would you need to average b across batch dimension? I don't see why would that be good, since that would make the model batch-size dependent. If there is any mention on this in the paper or other source, can you point out where and send a link, appreciated.

Paweł Kubik · Answer 1 · Fri Nov 10 2017 17:00:52 GMT+0800 (China Standard Time)

I was asking the same question in #21, but failed to formulate it properly at first.

Huadong Liao · Answer 2 · Fri Nov 10 2017 21:26:25 GMT+0800 (China Standard Time)

@JerrikEph It's the same question in #21, please follow the result in that Issue