UCSD-AI4H / SSL-TL

Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms

Pytorch code and models for paper

Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms

Xingyi Yang∗, Xuehai He∗,Yuxiao Liang, Yue Yang, Shanghang Zhang, Pengtao Xie *Equally contributed

This repository contains code and pre-trained models used in the paper and 2 demos to demonstrate:

Code for a comprehensive study between SSL and TL regarding which one works better under
- domain difference between source15and target tasks,
- the amount of pretraining data
- class imbalance in source data
- usage of target data for additional pretraining
Code to calculate domain distance between source domain and target domain in term of (1)Visual distance and (2)Class similarity

Dependencies:

Python (3.7)
Pytorch (1.5.0)
Tensorboard (1.14.0)
scikit-learn
https://github.com/ufoym/imbalanced-dataset-sampler

Datasets

In the paper, we used data from 5 source and 4 target datasets:

Source:
Target:

File orgnization

    - ssl (Self-supervise pretraining)
        - moco (MoCo pretraining)
    - tl (Supervised pretraining)
    - finetune (Fintune on Target tasks)
    - dataset (Datasplit for Caltech256)
    - domain (visual domain distance & label similarity)

Reference

MoCo: https://github.com/facebookresearch/moco

About

Languages

Language:Python 100.0%