thanhmvu / strong-augment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Notes

Prelimminary CIFAR10 Top-1

#Labels per class 10 40 400
Accuracy 92.37 94.03 95.09

Plan

Realisticity Analysis

  • Hypothesis 1: strongly and weakly augmented images have different realisiticity and we can measure/distinct this using out-of-distribution methods
  • Hypothesis 2: there are variation of realisiticity even among strongly augmented images. e.g. we expect posterization to be more unrealisitic than flip+translate+crop

Improving SSL

  • Hypothesis 3: we can improve SSL by training with appropriate/various levels of unrealisticity of strong augmentation

TODO

  • Test run this repo with CIFAR-10

Realisticity Analysis

  • Integrate OOD detection for measuring "realisiticity" of augmented images e.g. https://arxiv.org/abs/1912.03263 https://arxiv.org/abs/1905.11001
  • Manually check ODD score and qualitative realisticity of sample augmented images. See save-aug branch for the extraction sample augmented images (before normalization)
  • Generate/save all augmented variation as use in one epoch of FixMatch
  • Analysize the ODD scores of this distribution augmented images to test hypotheses 1 and 2

Improving SSL

  • Train FixMatch with various levels of unrealistic augmentation

=======

(below is the README from the original repo)

FixMatch

This is an unofficial PyTorch implementation of FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. The official Tensorflow implementation is here.

This code is only available in FixMatch (RandAugment). Now only experiments on CIFAR-10 and CIFAR-100 are available.

Requirements

  • Python 3.6+
  • PyTorch 1.4
  • torchvision 0.5
  • tensorboard
  • tqdm
  • numpy
  • apex (optional)

Usage

Train

Train the model by 4000 labeled data of CIFAR-10 dataset:

python train.py --dataset cifar10 --num-labeled 4000 --arch wideresnet --batch-size 64 --lr 0.03 --seed 5 --out results/cifar10@4000.5

Train the model by 10000 labeled data of CIFAR-100 dataset by using DistributedDataParallel:

python -m torch.distributed.launch --nproc_per_node 4 ./train.py --dataset cifar100 --num-labeled 10000 --arch wideresnet --batch-size 16 --lr 0.03 --out results/cifar100@10000

* When using DDP, do not use a seed.

Monitoring training progress

tensorboard --logdir=<your out_dir>

Results (Accuracy)

CIFAR10

#Labels 40 250 4000
Paper (RA) 86.19 ± 3.37 94.93 ± 0.65 95.74 ± 0.05
This code 92.92 94.13 95.33
Acc. curve link link link

CIFAR100

#Labels 400 2500 10000
Paper (RA) 51.15 ± 1.75 71.71 ± 0.11 77.40 ± 0.12
This code - - -
Acc. curve - - -

References

@article{sohn2020fixmatch,
    title={FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence},
    author={Kihyuk Sohn and David Berthelot and Chun-Liang Li and Zizhao Zhang and Nicholas Carlini and Ekin D. Cubuk and Alex Kurakin and Han Zhang and Colin Raffel},
    journal={arXiv preprint arXiv:2001.07685},
    year={2020},
}

About

License:MIT License


Languages

Language:Python 100.0%