Add script for merging expert models via weight averaging

Question

Add script for merging expert models via weight averaging

mrcabbage972 opened this issue a year ago · comments

We would like to create a script for creating a merged model by averaging expert weights.

The script would take as input:

List of experts models from the MDEL HF repo.
Name of the output model

The averaged model would be uploaded to the MDEL HF repo. It's model card should contain the names of the experts it was created from.

Ken Tsui · Answer 1 · Wed May 03 2023 11:19:33 GMT+0800 (China Standard Time)

@mrcabbage972 I am interested to help! We could use lm-evaluation-harness to benchmark the merged model.
The seedLM EleutherAI/pythia-1b-deduped will be a great baseline.

mrcabbage972 · Answer 2 · Wed May 03 2023 21:03:51 GMT+0800 (China Standard Time)

@kenhktsui Great, please assign the ticket to yourself!

Regarding lm-evaluation-harness, can you please create a separate issue for that and add the details (e.g. on which tasks we are going to test)?

Ken Tsui · Answer 3 · Wed May 03 2023 23:28:14 GMT+0800 (China Standard Time)

@mrcabbage972 I had added the evaluation ticket.

For the merge, let's align and define terminology as I see there are different implementations so that we could assign different tickets to different contributors:

c-BTM - which is a weighted logits of next token prediction
element-wise averaging/ blending of model parameters
mixture-of-experts

mrcabbage972 · Answer 4 · Sun May 07 2023 04:59:43 GMT+0800 (China Standard Time)

@kenhktsui Let's keep this ticket as element-wise averaging.
I created a separate one for c-BTM.

Ken Tsui · Answer 5 · Sun May 07 2023 15:21:22 GMT+0800 (China Standard Time)

@mrcabbage972 I think ticket has been done by Concedo and TeH_Venom. I would like to work on the c-BTM ticket.

mrcabbage972 · Answer 6 · Mon May 08 2023 10:34:48 GMT+0800 (China Standard Time)

@kenhktsui The version of Concedo's script that I saw only merges two experts, we need a solution to merge N.

To close the ticket, I think what is needed is a PR that:

adds the script to the repo
Extends it to support merging of N experts
Adds a section in the readme with usage instructions

If you prefer to focus on the c-BTM ticket, I can take this one.

mrcabbage972 · Answer 7 · Sat May 13 2023 00:10:30 GMT+0800 (China Standard Time)

May be able to load the models layer by layer