This repository contains the code to setup the experiments for the SemEval 2023 Visual-Word Sense Disambiguation task.
To finetune the the standard CLIP large pretraing from Hugging Face (here), you can run:
$ python clip_finetuning.py \
--textual_input full_phrase \
--log_filename clip_training.txt \
--epochs 30 \
--batch_size 16 \
--model_size large \
--textual_augmentation \
--visual_augmentation
If you want to test the standard pretrained CLIP model from Hugging Face (both large and base) you can run:
$ python single_clip_inference.py \
--log_filename clip_results.txt \
--log_step 200 \
--phase val \
--model_size large
Otherwise, if you alredy finetuned a model, you can load and use your checkpoint, stored in the checkpoints
folder, running the following:
$ python single_clip_inference.py \
--log_filename clip_results.txt \
--log_step 200 \
--phase val \
--clip_finetuned_model_name clip_finetuned.model