HazyResearch / safari

Convolutions for Sequence Modeling

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the model size

fransilvionGenomica opened this issue · comments

Hi,

I am trying to build a Hyena model using hyperparameters from Table A4 (the 4th row). I am using the implementation of a standalone model:

layer2 = HyenaOperator( d_model=1024, l_max=19072, order=36, filter_order=64, num_inner_mlps=4, emb_dim=17, w=14 )

However, when I check the number of parameters, I get ~42M instead of 355M as stated in the paper. Is it because I am using the standalone implementation? But even then how come the difference is so big? Or maybe I am missing something?

`def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)

count_parameters(layer)`

commented

Thank you for clarification!