Setup Relaxed Group Convolutional Network

Question

Setup Relaxed Group Convolutional Network

dgcnz opened this issue 6 months ago · comments

Diego commented 6 months ago

Resources:

Diego · Answer 1 · Wed May 01 2024 18:25:42 GMT+0800 (China Standard Time)

Implementation change: Replace rot_img with torchvision.transforms.functional.rotate.

Relevant code:

def rot_img(x: Tensor, theta: float) -> Tensor:
    """ Rotate batch of images by `theta` radians.

    :param x: batch of images with shape [N, C, H, W]
    :param theta: angle
    :returns rotated images
    """
    rot_mat = FloatTensor(
        [
            [np.cos(theta), -np.sin(theta), 0],
            [np.sin(theta), np.cos(theta), 0],
        ]
    )
    rot_mat = rot_mat.repeat(x.shape[0], 1, 1)
    grid = F.affine_grid(rot_mat.to(x.device), x.size(), align_corners=False).float()
    x = F.grid_sample(x, grid)
    return x.float()

Example:

Qualitative difference mask:

Benchmarks:

Conclusion:

I will use TTF.rotate because the qualitative differences are negligible and the performance gains is consistent.

Diego · Answer 2 · Sat May 04 2024 01:32:36 GMT+0800 (China Standard Time)

Second optimization (for the lifting layer):
Replacing:

torch.einsum("na, noa... -> oa...", relaxed_weights, filter_bank)

For this:

relaxed_weights.view(num_filter_banks, 1, group_order, 1, 1, 1) * filter_bank).sum(0)

Makes it considerably faster. Benchmark code is on tests/models/components/gcnn/lifting/test_relaxed_rotation.py

Diego · Answer 3 · Sat May 04 2024 22:16:59 GMT+0800 (China Standard Time)

A couple of weird things about the implementation of the weighted combination for the relaxed group convolution (not the lifting layer). The relaxed_weights now have shape (group_order, num_filter_banks), which is the transpose of the lifting layer's weights. I'm not sure if this is an actual modeling choice or just a random thing, but it requires an extra transpose function. However, this doesn't seem to affect performance.

    def fast():
        return torch.sum(
            relaxed_weights.transpose(0, 1).view(
                num_filter_banks, 1, group_order, 1, 1, 1, 1
            )
            * filter_bank,
            dim=0,
        )

    def fast_group_last():
        return torch.sum(
            relaxed_weights.view(num_filter_banks, 1, group_order, 1, 1, 1, 1)
            * filter_bank,
            dim=0,
        )

    def einsum():
        return torch.einsum("na, aon... -> on...", relaxed_weights, filter_bank)

The results for the group convolution are not that striking for cpu, but it still noticeable ~3x performance.

Observations:

Now mps is faster than cpu, which was the opposite for the lifting layer, maybe this is because this has an extra dimension and thus requires more parallelization? This can be somehow supported by the fact that the means of each test are considerably different (7 vs 20).
(group_order, num_filter_banks) and (num_filter_banks, group_order) have similar performances
einsum sucks

Diego · Answer 4 · Sat May 04 2024 22:18:32 GMT+0800 (China Standard Time)

It would be nice that the numbers in these tests were more atuned to real architectures, maybe this will be a task for the future.