HazyResearch / hyena-dna

Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena

Home Page:https://arxiv.org/abs/2306.15794

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarifying the models available on HF

yair-schiff opened this issue · comments

Hi,

On the LongSafari HF space there appear to be 2 copies of each model, one with -hf at the end of the name and one without.

I was wondering what the difference is between these models (other than one being compatible with AutoModel), because despite the names being the same and the variables in the config files looking almost identical (i.e., same d_model and n_layers), they have very different number of parameters. For example,

Which version of these models corresponds to the ones used in the paper experiments? If I am not mistaken, it should be the first one (i.e., the one without -hf in the name)?

@exnx, after digging into the two versions of each model, it appears that the main difference is in how the PositionalEmbedding modules are defined:

That is, in the repo here, the PositionalEmbedding model, has no learnable parameters:

        self.register("z", z, lr=lr_pos_emb)

because in the config files(i.e., in configs/experiment/hg38/hg38_hyena.yaml), lr_pos_emb = 0.0, so the code uses register_buffer (i.e., here)

However, on HF, the version of each model that has -hf in the name uses this modeling code:

    self.z = nn.Parameter(z, requires_grad=True)

This increases the number of parameters for the -hf version of each model, especially for long sequence models.


So I guess my question is which of these would be the "correct" model to compare to and which was used in the paper's experiments?

commented

Hi @yair-schiff, I think the version in this repo is more authoritative. This was an error in the HF port - I'll submit a fix soon, and hopefully the two versions should be equivalent after that!

@Rocketknight1, thanks for following up. I should have posted here as well after I did some digging. The two models have equivalent weights. As you mention, I think it was just a small discrepancy in the HF port that set the z parameter to "learnable". Thanks!

commented

@yair-schiff No probs! The code for the -hf models should now be updated with z as a buffer instead of a parameter.