Code for projecting pre-trained BERT weights into Monarch matrices

Question

Code for projecting pre-trained BERT weights into Monarch matrices

sinamps opened this issue 10 months ago · comments

Sina Mahdipour Saravani commented 10 months ago

Hello, I would like to know if you have published the code to project the pre-trained weights of the BERT model into Monarch matrices. I cannot locate the code for this (I have also looked in the fly repo).
I can see the projection functions here, but I am interested in knowing how you use them specifically for BERT (or other transformers for NLP) to go from pre-trained weights to Monarch matrices. Thank you very much.

Dan Fu · Answer 1 · Wed Oct 04 2023 01:58:12 GMT+0800 (China Standard Time)

Ah, we don't actually use those in our work - that file was just copy-pasted from the fly repo. In M2 we're training everything from scratch, since the gated convolutional layers are quite different in function from an attention layer. It would be interesting to figure out how to distill an attention layer into a gated convolution!

Sina Mahdipour Saravani · Answer 2 · Wed Oct 04 2023 02:52:13 GMT+0800 (China Standard Time)

Thank you for your prompt response @DanFu09. Would you happen to have any pointers on how that was done in the fly work? I am already working with those projection functions from the fly repo, but I want to make sure I correctly reproduce the results.