arthurdouillard / dytox

Dynamic Token Expansion with Continual Transformers, accepted at CVPR 2022

Home Page:https://arxiv.org/abs/2111.11326

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The avg accuracy on CIFAR100 50steps

jmin0530 opened this issue · comments

Hello, Thank you for your code.
I used the setting of dytox for 50 steps, but I got a different results from your paper.

I ran cli command below

bash train.sh 0,1 \
    --options options/data/cifar100_2-2.yaml options/data/cifar100_order1.yaml options/model/cifar_dytox.yaml \
    --name dytox \
    --data-path MY_PATH_TO_DATASET \
    --output-basedir PATH_TO_SAVE_CHECKPOINTS \
    --memory-size 1000

According to your paper, your result on CIFAR-100 50 steps is "Avg acc: 64.82, Last acc: 45.61"
Here is the three CIFAR-100 orders reproduction result:

Also I will show dytox setting to you

DyTox, for CIFAR100

Model definition

model: convit
embed_dim: 384
depth: 6
num_heads: 12
patch_size: 4
input_size: 32
local_up_to_layer: 5
class_attention: true

Training setting

no_amp: true
eval_every: 50

Base hyperparameter

weight_decay: 0.000001
batch_size: 128
incremental_batch_size: 256
incremental_lr: 0.0005
rehearsal: icarl_all

Knowledge Distillation

auto_kd: true

Finetuning

finetuning: balanced
finetuning_epochs: 20
ft_no_sampling: true

Dytox model

dytox: true
freeze_task: [old_task_tokens, old_heads]
freeze_ft: [sab]

Divergence head to get diversity

head_div: 0.1
head_div_mode: tr

Independent Classifiers

ind_clf: 1-1
bce_loss: true

Advanced Augmentations, here disabled

Erasing

reprob: 0.0
remode: pixel
recount: 1
resplit: false

MixUp & CutMix

mixup: 0.0
cutmix: 0.0

I can't understand why my reproduction results differ from the results you wrote in your paper.
Thank you.

See https://github.com/arthurdouillard/dytox/blob/main/erratum_distributed.md

You probably want to use global memory and 2k memory.

If you use distributed memory with 1k, your effective memory size is rather low (much lower than 2k).