O(2) group, irreps, and PyTorch DDP.

Question

O(2) group, irreps, and PyTorch DDP.

ahyunSeo opened this issue 2 years ago · comments

Hello,

Thank you for your nice work. I'm a heavy user of this library :)
Recently I'm working on O(2) groups and I have some questions about irreps.
I'm following your approach here to use every possible irreps in the fibergroup.

irreps = []
for n, irr in gc.fibergroup.irreps.items():
    if n != gc.trivial_repr.name:
        irreps += [irr] * int(irr.size // irr.sum_of_squares_constituents)
irreps = list(irreps)

What I can see is that the number of the irreps is independent to the maximum_frequency assigned when building a gspace.
My questions are

What are the physical meanings of frequency / maximum_frequency?
What is the difference between the different irreps items?
Is there any reason for doing the above practice, rather than using specific types of irreps?

In addition, I have a minor bug to report related to the code snippet above.
In short: PyTorch DDP does not support them.
Here are the reasons and my solution.

DDP requires that each distributed process has the exact same model and thus the exact same parameter registration order.
The layers like GNorm register the parameter by iterating the unique_representations.
The unique representations in FieldType are implemented using set(), thus the order is non-deterministic.
If I run DDP, the orders of the parameters shuffle as the unique_representation of the fieldType generated with multiple irreps are not consistent across different processes, thus the run fails.

Simply fixed by changing 'field_type.py' as following, then install from the source (pip install .)

# self._unique_representations = set(self.representations)
_unique_representations = list(dict.fromkeys(self.representations).keys())
self._unique_representations = _unique_representations

I'm not raising a pull request because there might be any unexpected side effects.

Best,

Ahyun Seo

Gabriele Cesa · Answer 1 · Fri Sep 30 2022 23:47:44 GMT+0800 (China Standard Time)

Hi @ahyunSeo

I am happy to read that!

What I can see is that the number of the irreps is independent to the maximum_frequency assigned when building a gspace.

Is there any reason for doing the above practice, rather than using specific types of irreps?

The maximum_frequency you specify in the gspace is used to pre-instantiate all irreps up to that frequency, i.e. it populates the list gspace.irreps with these irreps. This is not necessary at all and you can always instantiate the irreps you need later (these will anyways be all cached in the irreps attribute).
In other words, setting maximum_frequency guarantees that the list irreps contains at least all irreps of frequency smaller or equal to maximum_frequency.

What are the physical meanings of frequency / maximum_frequency?
You can imagine them as rotational frequencies. the action of a rotation r_\theta on a vector in a 2-dimensional space associated with an irrep of frequency k will rotate by an angle k * \theta around the origin.

What is the difference between the different irreps items?

In terms of neural network features, there is an analogy between these frequencies and the frequencies in a Fourier transform. When taking a Fourier transform, the larger is the set of frequencies considered, the more complex are the functions we can parameterize. Similarly, the more frequencies we include in a neural network, the more expressive it is.

In addition, I have a minor bug to report related to the code snippet above.
In short: PyTorch DDP does not support them.

Oh damn, thanks a lot for spotting this problem! Please, feel free to open a pull-request with your solution; I will then take care of testing it and ensuring everything still works fine before merging it :)

Best,
Gabriele

ahyunSeo · Answer 2 · Sun Oct 09 2022 14:47:34 GMT+0800 (China Standard Time)

Hello, Gabriele

Thank you for the detailed response!
Now I understand more about frequencies.

Ahyun