huu4ontocord / MDEL

Multi-Domain Expert Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add script for merging expert models via weight averaging

mrcabbage972 opened this issue · comments

We would like to create a script for creating a merged model by averaging expert weights.

The script would take as input:

  1. List of experts models from the MDEL HF repo.
  2. Name of the output model

The averaged model would be uploaded to the MDEL HF repo. It's model card should contain the names of the experts it was created from.

@mrcabbage972 I am interested to help! We could use lm-evaluation-harness to benchmark the merged model.
The seedLM EleutherAI/pythia-1b-deduped will be a great baseline.

@kenhktsui Great, please assign the ticket to yourself!

Regarding lm-evaluation-harness, can you please create a separate issue for that and add the details (e.g. on which tasks we are going to test)?

@mrcabbage972 I had added the evaluation ticket.

For the merge, let's align and define terminology as I see there are different implementations so that we could assign different tickets to different contributors:

  • c-BTM - which is a weighted logits of next token prediction
  • element-wise averaging/ blending of model parameters
  • mixture-of-experts

@kenhktsui Let's keep this ticket as element-wise averaging.
I created a separate one for c-BTM.

@mrcabbage972 I think ticket has been done by Concedo and TeH_Venom. I would like to work on the c-BTM ticket.

@kenhktsui The version of Concedo's script that I saw only merges two experts, we need a solution to merge N.

To close the ticket, I think what is needed is a PR that:

  1. adds the script to the repo
  2. Extends it to support merging of N experts
  3. Adds a section in the readme with usage instructions

If you prefer to focus on the c-BTM ticket, I can take this one.

May be able to load the models layer by layer