Fkaneko / kaggle_google_universal_image_embedding

Kaggle GUIE 25th solution

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kaggle 25th solution for "Google Universal Image Embedding(GUIE)"

Overview

Initially I was working on this competition with text-image contrastive method and trying dimension reduction technique like UMAP and also trying embedding each category(fashion/package/landmark...) on different discrete spaces, but these approach were not good. The approach of motono0223's baseline was best for me. freezed CLIP + Arcface head.

Downsampling for the number of classes

  • 64D embedding space is not large so I tuned the number of classes for training.
  • Training 4000 classes from Products10k/GLR was the best for me. image

Hard to create good CV-LB correlation

  • I configured 6 different retrieval tasks, Products10k/GLR/Stanford Online Products/DeepFashion/MET/Food-101/ObjectNet but could not find clear CV-LB correlation.
  • When only training with Products10k/GLR, Stanford Online Products(Cabinet/Sofa/Chair) retrieval setting was relatively correlated to LB. But it was still not perfect one. image

Freeze CLIP

  • Unfreeze CLIP finetune does not worked for me. Lowering learning rate got worked but freeze & high learning rate was better.
  • Finetuning other backbones, like Imagenet pretrained models, were not good.
  • Adding more transformer layers on freezed CLIP was not good.

Top scripts description

├ Dockerfile
├ LICENSE
├ pyproject.toml
├ README.md
├ requirements_dev.txt                         # python deps, for development
├ requirements.txt                             # main python deps for this project
├ train_multi_domain.py                        # training
├ eval_multi_domain_with_retrieval.py          # evaluation with retrieval for 6 different datasets
└ setup.cfg

How to run

  • This is a competition code and some part is not so clean.
  • And due to the nature of this competition, 6~13 different public datasets, Products10k/GLR/Stanford Online Products ... are needed.

environment

  • Ubuntu 18.04
  • Python with Anaconda
  • NVIDIA GPUx1

Install dependencies,

# clone project
$PROJECT=kaggle_google_universal_image_embed
$CONDA_NAME=guie
git clone https://github.com/Fkaneko/$PROJECT

# install project
cd $PROJECT
conda env create -f  ./conda_env.yaml
conda activate $CONDA_NAME
pip install -U pip
pip install -r ./requirements_dev.txt
pip install -r ./requirements.txt

and need the following directory configuration

    ├ input/            # dataset directory
    ├ working/           # training result will be stored here
    └ kaggle_kaggle_google_universal_image_embed/  # this github project.

Docker

IMAGE_NAME="kaggle/guie"
TAG="0.0.1"
WORK_DIR_NAME="google_universal_image_embed"

# docker image build. it takes few minutes.
docker build -f Dockerfile . -t ${IMAGE_NAME}:${TAG}

# Start a docker container
wandb docker-run -it --rm \
    --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
    -v "${HOME}/kaggle/input":"${HOME}/kaggle/input" \
    -v "${HOME}/kaggle/working":"${HOME}/kaggle/working" \
    -v "${HOME}/kaggle/${WORK_DIR_NAME}":"${HOME}/kaggle/${WORK_DIR_NAME}" \
    -w "${HOME}/kaggle/${WORK_DIR_NAME}" \
    ${IMAGE_NAME}:${TAG}

training

Run training with Wandb,

python ./train_multi_domain.py

evaluation

python ./eval_multi_domain_with_retrieval.py

License

  • code: Apache 2.0
  • dataset used in this project: Please check for each dataset license.

Reference

About

Kaggle GUIE 25th solution

License:Apache License 2.0


Languages

Language:Python 99.8%Language:Dockerfile 0.2%