Inconsistencies in Reported Dimensions and Configuration Files

Question

Inconsistencies in Reported Dimensions and Configuration Files

fvarno opened this issue 4 months ago · comments

In Table 1 of Gemma Technical Report Feedforward hidden dims are listed as 32768 and 49152 for the 2B and 7B models, respectively. However, these figures do not align with the numbers provided in the configuration files for the for 7B model and 2B model. This discrepancy leads me to wonder whether I am comparing the incorrect figures, if there is an error in the report, or if the experiments were conducted using different configuration files. Should the numbers in the technical report require revision, it would also be necessary to update the reported total number of parameters accordingly.

Pengchong Jin · Answer 1 · Thu Feb 22 2024 07:46:25 GMT+0800 (China Standard Time)

The feed forward hidden dims in the table of this tech report is the sum of hidden dim of gate projection and up projection, which is 2X of intermediate_size in the code.

Hope that explains.

Farshid Varno · Answer 2 · Thu Feb 22 2024 07:55:24 GMT+0800 (China Standard Time)

Your explanation, combined with reviewing the code, has clarified what I was missing. Thank you for the clarification.