google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Home Page:https://ai.google.dev/gemma

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistencies in Reported Dimensions and Configuration Files

fvarno opened this issue · comments

In Table 1 of Gemma Technical Report Feedforward hidden dims are listed as 32768 and 49152 for the 2B and 7B models, respectively. However, these figures do not align with the numbers provided in the configuration files for the for 7B model and 2B model. This discrepancy leads me to wonder whether I am comparing the incorrect figures, if there is an error in the report, or if the experiments were conducted using different configuration files. Should the numbers in the technical report require revision, it would also be necessary to update the reported total number of parameters accordingly.
image

The feed forward hidden dims in the table of this tech report is the sum of hidden dim of gate projection and up projection, which is 2X of intermediate_size in the code.

Hope that explains.

Your explanation, combined with reviewing the code, has clarified what I was missing. Thank you for the clarification.