octiapp / KerasPersonLab

Grateful for your implementation! I have some questions about the bilinear sampler.

Line 49 in 32d44dd

base = base + bilinear_sampler(offsets, base)

Why base = base + bilinear_sampler(offsets, base) instead of base = base + bilinear_sampler(base, offsets) ? Does this mean that we interpolate the offset according to the base ?

KerasPersonLab/bilinear.py

Line 43 in 32d44dd

iy0 = vy0 + h

If we iy0 = vy0 + h, then the iy0 will represent the destination position of offsets. Why using these positions when iy0 = tf.where(mask, tf.zeros_like(iy0), iy0) ?

In this condition,

x00 = tf.gather_nd(x, i00)
x01 = tf.gather_nd(x, i01)
x10 = tf.gather_nd(x, i10)
x11 = tf.gather_nd(x, i11)

will gather values from the destination positions of predicted offsets, shouldn't we gather values from the start positions of offsets?

Hello @yangsenius and thanks for your interest.

Regarding (1), the offsets are the small_offsets around each keypoint, and the base is either the midrange-offsets or the long-range offsets which need to be refined. We sample the offsets by the locations specified by the base and add them back into the base. I hope this is clear.

I'm not sure what your question is in (2). We are indeed trying to gather values from the destinations of the offsets. If your question is in regard to the sampling implementation, note that this implementation is borrowed, as I mentioned in the readme, and I only lightly modified it to deal with border issues. Now, with tensorflow 1.14, you can use instead tf.contrib.resampler.resampler.

Hello @jricheimer, tｈａｎｋｓ for your helpful answer ! I think i have understood it :

iy0 = vy0 + h represents the destinations positions of the base (mid or long offsets). We should resample these positions' values in offsets (short offsets) according to the base (e.g w00 = (1.-dx) * (1.-dy) come from the base). As a result, we can obtain more accurate offsets values by bilinear_sampler(offsets, base) and add them back into the base, so the base's destination is closer to the target. The paper did not tell me something like these~