Add script for merging expert models via weight averaging
mrcabbage972 opened this issue · comments
We would like to create a script for creating a merged model by averaging expert weights.
The script would take as input:
- List of experts models from the MDEL HF repo.
- Name of the output model
The averaged model would be uploaded to the MDEL HF repo. It's model card should contain the names of the experts it was created from.
@mrcabbage972 I am interested to help! We could use lm-evaluation-harness to benchmark the merged model.
The seedLM EleutherAI/pythia-1b-deduped will be a great baseline.
@kenhktsui Great, please assign the ticket to yourself!
Regarding lm-evaluation-harness, can you please create a separate issue for that and add the details (e.g. on which tasks we are going to test)?
@mrcabbage972 I had added the evaluation ticket.
For the merge, let's align and define terminology as I see there are different implementations so that we could assign different tickets to different contributors:
- c-BTM - which is a weighted logits of next token prediction
- element-wise averaging/ blending of model parameters
- mixture-of-experts
@kenhktsui Let's keep this ticket as element-wise averaging.
I created a separate one for c-BTM.
@mrcabbage972 I think ticket has been done by Concedo and TeH_Venom. I would like to work on the c-BTM ticket.
@kenhktsui The version of Concedo's script that I saw only merges two experts, we need a solution to merge N.
To close the ticket, I think what is needed is a PR that:
- adds the script to the repo
- Extends it to support merging of N experts
- Adds a section in the readme with usage instructions
If you prefer to focus on the c-BTM ticket, I can take this one.
May be able to load the models layer by layer