Rotation equivariance on a specific angle

Question

Rotation equivariance on a specific angle

csuhan opened this issue 4 years ago · comments

Hi, @Gabri95 , thank you for your nice work! And I have got inspiration from your work!
But a question (may be very simple) confused me:
How to generate rotation equivariant features on a specific angle (e.g., 45, 33.3), but not all the 360 degree features.
In other words, I just want the features on a particular orientation.
I guess the FieldType defined in SO(2) can resolve this problem, but I dont know how to use related functions.

Gabriele Cesa · Answer 1 · Mon Sep 21 2020 18:10:22 GMT+0800 (China Standard Time)

Hi @csuhan,

I am very happy to read this! :)

Let me know if I understand well your question.
Let's assume a C_N equivariant network with a regular field type as output.
This means the output has N output channels, one for each of the N rotations of the input.
What you are asking now is how to get the response (i.e. the channel) associated with a specific rotation.

The problem is that this is not really well defined, as you should think about these channels somehow in a relative term.
Imagine your network is made only of a single convolution layer mapping from a 1-channel (black-and-white) image to a single regular field as above. This means the model learn 1 single filter and applies it N times, in N different orientations.
You could choose to identify the first output as the original orientation of the filter and the i-th output as the response in the i-th orientation. This is however arbitrary, as you could choose any of these filters as the "original" one.
Imagine your layer learns 4 filters of this form (in this order):
< v > ^
You can consider the output of < as the one associated with the angle 0.0, v with the angle 90.0, etc..
But your network could as well learn the filters in this order:
v > ^ <
The result is identical, but now you associate v with the angle 0.0.

The point here is very similar to the case of convolution over an image.
You can take a 3x3 filter and doing convolution over your pixel grid.
What you are asking now is "which of the outputs is the one associated with the filter translated by a specific a DX?"
To answer this question one first needs to define which one is the origin of the pixel grid, i.e. the point/pixel identified by the translation DX = (0, 0).

This does not mean what you ask is impossible, just that more context is required.
For instance, your learning task may define such origin or you could simply choose the origin (e.g. take the first channel as suggested initially).

The example above used C_N, i.e. only rotations which are multiple of 2pi/N.
If you want arbitrary continuous angles, then you can use the SO(2) group.
Here, unfortunately, you can not use regular fields in practice (as they would contain infinitely many elements), but you can use its irreps. Irreps do not explicitly store the activations at all orientations but still contain all the information needed to retrieve these values!

For instance, if you have a frequency-1 irrep of SO(2), it basically stores the outputs at 2 orthogonal directions (e.g. X and Y axis).
If you left multiply this 2-dimensional output with the 2x2 rotation matrix (i.e. the irrep of freq-1) of theta you get the output at theta and theta + 90.

We can see this more generally.
Let's call y the output of your G-equivariant neural network (e.g. G=SO(2)).
Assume your output is associated with a field type rho (here we actually use the representation attribute).
You could just fix an origin (or learn it?) as a vector v of the same size and then compute the response at the element g of your group (e.g. g is the rotation by 33.3° in SO(2)) as
response_at_g = v.T @ rho(g).T @ y

Let's see a practical example with some code:

from e2cnn import gspaces                                          
from e2cnn import nn                                               
import torch            
import numpy as np                                        

# build the group SO(2) (we use up to frequency 2 irreps)                                                                   
r2_act = gspaces.Rot2dOnR2(N=-1, maximum_frequency=2)

# input image has 1 color channel
feat_type_in  = nn.FieldType(r2_act,  1*[r2_act.trivial_repr])     

# the output is a vector field of frequency 1 (associated with the freq-1 irrep as in the example above)
feat_type_out = nn.FieldType(r2_act, 1*[r2_act.irrep(1)])

# build the neural network (here, only one convolutional layer)
conv = nn.R2Conv(feat_type_in, feat_type_out, kernel_size=5)       
                                                                   
# generate a random input
x = torch.randn(1, 1, 5, 5)                                     
x = nn.GeometricTensor(x, feat_type_in)                            

# compute output
y = conv(x)

# the output has 2 channels
y.shape
>> torch.Size([1, 2, 1, 1])

# this vector implicitly defines the origin. 
v = torch.tensor([[1., 0.]]).T

# Recall that the output of freq-1 stores the response associated to 2 orthogonal direction
# We assume that the first is the one associate with the X axis direction and the second to the Y axis direction
# We interpret the first one (X axis) as the non-rotated filter and the second (Y axis) as a rotation by 90 deg
# If we now multiply by `v.T`, this just selects the response on the X axis

assert y.tensor[0, 0, 0, 0] == v.T  @ y.tensor[0, :, 0, 0]

# we can add a rotation by 90deg to obtain the response on the Y axis
rho = feat_type_out.representations[0]      # = r2_act.irrep(1)
g = np.pi / 2    # 90 deg
rho_g = torch.tensor(rho(g), dtype=torch.float)

assert y.tensor[0, 1, 0, 0] == v.T @ rho_g.T @ y.tensor[0, :, 0, 0]

# for an arbitrary angle (e.g. 33.3 deg)
g = 2*np.pi / 12    # 33.3 deg
rho_g = torch.tensor(rho(g), dtype=torch.float)

v.T @ rho_g.T @ y.tensor[0, :, 0, 0]

This should work also for other groups and representations.
For instance, if you used C_N and a regular field (as in the very first example), you could have chosen v = [1, 0, 0, ...].
That would have resulted in identifying the first channel with the non-rotated response and the i-th channel with the rotation by i 2pi/N.
A different choice of v defines a different origin.

I hope this was useful and that it somehow answered your question.
Please, let me know if you have any other doubts or if you have other questions

Best,
Gabriele

Jiaming Han · Answer 2 · Mon Sep 21 2020 22:42:04 GMT+0800 (China Standard Time)

Thank you for your detailed reply! I think I have already known how to do with that, and I will try it soon!