HuKai97 / ECCV2022-ILR-workshop

2nd place solution to Google Universal Image Embedding Challenge!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Google Universal Image Embedding Challenge 2022

2nd Place Solution

HARDWARE & SOFTWARE

Ubuntu 18.04.3 LTS

CPU: AMD EPYC 7543 32-Core Processor

GPU: 6 * NVIDIA A40 PCIe, Memory: 48G

Python: 3.8

Pytorch: 1.9.0+cu111

Data Preparation

  1. Download all data from the data source below:

    Aliproducts

    Art_MET

    DeepFashion(Consumer-to-shop)

    DeepFashion2(hard-triplets)

    Fashion200K

    ICCV 2021 LargeFineFoodAI

    Food Recognition 2022

    JD_Products_10K

    Landmark2021

    Grocery Store

    rp2k

    Shopee

    Stanford_Cars

    Stanford_Products

  2. Run Get_Data.ipynb to create a csv file to corresponds to images for each dataset.

  3. Run Data_preprocessing.ipynb to filter out classes with less than 3 images, and resize all images to 224.

  4. Run Data_Merge.ipynb to merge all the csvs, and do sampling and resamping. Will get final_data_224_sample_balance.csv.

  5. Stratified Kfold.

import pandas as pd
from sklearn.model_selection import StratifiedKFold
df = pd.read_csv('autodl-tmp/final_data_224_sample_balance.csv')
df['fold'] = -1
split = list(StratifiedKFold(n_splits=20, shuffle=True, random_state=999).split(df, df['new_labels']))
for fold, (train_idx, valid_idx) in enumerate(split):
    df.loc[valid_idx, 'fold'] = fold
df.to_csv('autodl-tmp/final_data_224_sample_balance_fold.csv', index=False)
df.head(5)

Model Preparation

  1. Pre-trained ViT-H-14 from open_clip

  2. Get the visual module:

import open_clip
import torch
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14', pretrained='laion2b_s32b_b79k', cache_dir='./pretrained_models')
model_visual = model.visual
torch.save(model_visual.state_dict(), './pretrained_models/ViT_H_14_2B_vision_model.pt')

Training

  1. All configurations for ViT-H-14-Visual can be found in ./GUIE/config_clip_224.py

  2. Training:

!CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
python -m torch.distributed.launch --nproc_per_node=6 \
./GUIE/train.py \
--csv-dir ./final_data_224_sample_balance_fold.csv \
--config-name 'vit_224' \
--image-size 224 \
--batch-size 32 \
--num-workers 10 \
--init-lr 1e-4 \
--n-epochs 10 \
--cpkt_epoch 10 \
--n_batch_log 300 \
--warm_up_epochs 1 \
--fold 1

Contact

Email: 3579628328@qq.com

About

2nd place solution to Google Universal Image Embedding Challenge!

License:MIT License


Languages

Language:Python 56.2%Language:Jupyter Notebook 43.8%