precision on imagenet experiment

Question

precision on imagenet experiment

Karami-m opened this issue 5 months ago · comments

Hi,

For imagenet, you mentioned in the paper the Hyena code is used for the experimentation by replacing MLP blocks in Hyena ViT-b with block-diagonal matrices, similarly to M2-BERT. Based on the config file: trainer: precision: 16 is used in Hyena, so I wonder if you use mixed precision bf16 here for imagenet (similar to M2-bert) to train it on A100 gpus or used simple 16-bit precision.

Mahdi Karami · Answer 1 · Mon Jan 22 2024 23:55:08 GMT+0800 (China Standard Time)

Also, in the sequence mixer of M2-bert, you replaced attention with bidirectional gated convolutions with a residual long convolution (Figure3 left). So I wonder if did the same for imagenet and included residual long convolution in the model? I am asking as the monarch is part of a residual sequence mixing layer which has a residual connection (although it is not a residual long convolution).