sshh12 / multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Require ``ModalityArguments`` for new modalities

super-dainiu opened this issue · comments

Hi, this repo is fantastic! However, when modifying the code, I found that some new modalities might require a specific modality argument, i.e. pooler type, model_name_or_path...

Is that possible to be like this?

parser = transformers.HfArgumentParser(
    (TrainingArguments, ModelArguments, DataArguments)
)

training_args, model_args, data_args, remaining_args = parser.parse_args_into_dataclasses(
    return_remaining_strings=True
)

for item in modalities:
    parser = transformers.HfArgumentParser((MODALITY_ARGUMENTS[item], ))

    modality_args, remaining_args = parser.parse_args_into_dataclasses(
        return_remaining_strings=True
    )
    modality = MODALITY_BUILDERS[item](**modality_args)

model_cls = LANGUAGE_MODEL_NAME_TO_CLASS[model_args.model_cls]
train_for_modalities(model_cls, training_args, model_args, data_args, modalities)

Hey! My intent was for any modality arguments to be included within the modality group definition https://github.com/sshh12/multi_token/blob/main/multi_token/modalities/__init__.py

I choose this to be hardcoded rather than cli specified bc I felt like it simplified the inference flow (which just needs to key in on the group rather than load another set of arguments.

You can see I also specifically specified model paths in that file

 "audio_whisper": lambda: [
        WhisperAudioModality(
            num_tokens_output=10, model_name_or_path="openai/whisper-small"
        )
    ],

Multiple variants of arguments should then just be different groups. Like audio_whisper, audio_whisper_large, audio_whisper_more_layers.

Totally possible to refactor this into cli + store these arguments within the saved model config but should be possible to still achieve what you want as is by editing that file.

I see! Thanks for your kind reply! I might modify the files for my own use cases.