Frankluox / LightningFSL

LightningFSL: Pytorch-Lightning implementations of Few-Shot Learning models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem of reproducing Meta-baseline

LIUZIJING-CHN opened this issue · comments

Excuse me? It seems that I can't reproduce the the performance of meta-baseline as you give at 62.
I followed the instruction of the two-stage training, resulting accuracy of 76 for pre-trained model and 60 for fine-tuned model (both on val set). And I set the training shot to 5 as you mentioned before. It didn't work either.

Hi, I will test again, and please wait for about one day.

Hi, I have reproduced the results again and everything goes just as expected. Please make sure (1) you are using the version of miniImageNet containing original images, not the version containing pickle files; (2) you have fine-tuned the backbone with PN, i.e., set config_dict["is_test"] to False and config_dict["pre_trained_path"] to the pre-trained model. This concern comes from the 60 accuracy you report, which matches exactly with the test accuracy of the pre-trained model. Also, I notice that you report an accuracy of 76 for pre-trained model, which must be the 5-shot setting, instead of 1-shot. Please set test_shot in configuration and see accuracy on test set instead of val set. Hope one of these concerns could solve your problem.

Thx, I'll try that.

Hello, I find that the problem may be the pre-trained backbone. I directly use it(76 on val) to on test set, but with acc of only 55 for 1-shot, which is not 60 as you told.
I'm certain that I use the standard mini-imagenet with image size of 84, with three different set files.
Is there any wrong with my setting? I provide my config below, hope you can help me.
`def config():
config_dict = {}

#if training, set to False
config_dict["load_pretrained"] = False
#if training, set to False
config_dict["is_test"] = False
if config_dict["is_test"]:
    #if testing, specify the total rounds of testing. Default: 5
    config_dict["num_test"] = 5
    config_dict["load_pretrained"] = True
    #specify pretrained path for testing.
if config_dict["load_pretrained"]:
    config_dict["pre_trained_path"] = "../results/CC/version_11/checkpoints/epoch=52-step=26499.ckpt"
    #only load the backbone.
    config_dict["load_backbone_only"] = False

#Specify the model name, which should match the name of file
#that contains the LightningModule
config_dict["model_name"] = "CE_pretrain"



#whether to use multiple GPUs
multi_gpu = False
if config_dict["is_test"]:
    multi_gpu = False
#The seed
seed = 10
config_dict["seed"] = seed

#The logging dirname: logdir/exp_name/
log_dir = "./results/"
exp_name = "meta_baseline_pretrain_new/first_ex"

#Three components of a Lightning Running System
trainer = {}
data = {}
model = {}


################trainer configuration###########################


###important###

#debugging mode
trainer["fast_dev_run"] = False

if multi_gpu:
    trainer["accelerator"] = "ddp"
    trainer["sync_batchnorm"] = True
    trainer["gpus"] = [2,3]
    trainer["plugins"] = [{"class_path": "plugins.modified_DDPPlugin"}]
else:
    trainer["accelerator"] = None
    trainer["gpus"] = [0]
    trainer["sync_batchnorm"] = False

# whether resume from a given checkpoint file
trainer["resume_from_checkpoint"] = None # example: "../results/ProtoNet/version_11/checkpoints/epoch=2-step=1499.ckpt"

# The maximum epochs to run
trainer["max_epochs"] = 100

# potential functionalities added to the trainer.
trainer["callbacks"] = [{"class_path": "pytorch_lightning.callbacks.LearningRateMonitor", 
              "init_args": {"logging_interval": "step"}
              },
            {"class_path": "pytorch_lightning.callbacks.ModelCheckpoint",
              "init_args":{"verbose": True, "save_last": True, "monitor": "val/acc", "mode": "max"}
            },
            {"class_path": "callbacks.SetSeedCallback",
             "init_args":{"seed": seed, "is_DDP": multi_gpu}
            }]

###less important###
num_gpus = trainer["gpus"] if isinstance(trainer["gpus"], int) else len(trainer["gpus"])
trainer["logger"] = {"class_path":"pytorch_lightning.loggers.TensorBoardLogger",
                    "init_args": {"save_dir": log_dir,"name": exp_name}
                    }
trainer["replace_sampler_ddp"] = False



##################shared model and datamodule configuration###########################

#important
test_shot = 5

#less important
per_gpu_val_batchsize = 8
per_gpu_test_batchsize = 8
way = 5
val_shot = 5
num_query = 15

##################datamodule configuration###########################

#important

#The name of dataset, which should match the name of file
#that contains the datamodule.

data["train_dataset_name"] = "miniImageNet"

data["train_data_root"] = "/data_25T/lzj/LightningFSL-main/mini_imageNet"

data["val_test_dataset_name"] = "miniImageNet"

data["val_test_data_root"] = "/data_25T/lzj/LightningFSL-main/mini_imageNet"
#determine whether meta-learning.
data["train_batchsize"] = 128

data["train_num_workers"] = 8
#the number of tasks
data["val_num_task"] = 1200
data["test_num_task"] = 2000


#less important
data["num_gpus"] = num_gpus
data["val_batchsize"] = num_gpus*per_gpu_val_batchsize
data["test_batchsize"] = num_gpus*per_gpu_test_batchsize
data["test_shot"] = test_shot
data["val_num_workers"] = 8
data["is_DDP"] = True if multi_gpu else False
data["way"] = way
data["val_shot"] = val_shot
data["num_query"] = num_query
data["drop_last"] = False
data["is_meta"] = False

##################model configuration###########################

#important

#The name of feature extractor, which should match the name of file
#that contains the model.
model["backbone_name"] = "resnet12"
#the initial learning rate
model["lr"] = 0.1*data["train_batchsize"]/128


#less important
model["task_classifier_name"] = "proto_head"
model["task_classifier_params"] = {"learn_scale":False}
model["way"] = way
model["val_shot"] = val_shot
model["test_shot"] = test_shot
model["num_query"] = num_query
model["val_batch_size_per_gpu"] = per_gpu_val_batchsize
model["test_batch_size_per_gpu"] = per_gpu_test_batchsize
model["weight_decay"] = 5e-4
#The name of optimization scheduler
model["decay_scheduler"] = "specified_epochs"
model["decay_epochs"] = [50, 70, 90]
model["decay_power"] = 0.1
model["optim_type"] = "sgd"
model["num_classes"] = 64



config_dict["trainer"] = trainer
config_dict["data"] = data
config_dict["model"] = model`

I found you've changed the decay epochs. Please set model["decay_epochs"] = [90]

I found you've changed the decay epochs. Please set model["decay_epochs"] = [90]

OK, I'll change it back. But I ran another exp with that setting, and it has similarly low performance.

Perhaps I found where the problem is. One unnecessary parameter causes some confusion. Please set model["task_classifier_params"] = {"learn_scale":False, "normalize":False} in set_config_meta_baseline_pretrain and test the performance of pre-trained model again. If the performance improves, then you could set model["normalize"] = False in set_config_meta_baseline_finetune, and fine-tune the pre-trained model again. Please let me know if there is anything changed.

Thx,I will try that soon.

Perhaps I found where the problem is. One unnecessary parameter causes some confusion. Please set model["task_classifier_params"] = {"learn_scale":False, "normalize":False} in set_config_meta_baseline_pretrain and test the performance of pre-trained model again. If the performance improves, then you could set model["normalize"] = False in set_config_meta_baseline_finetune, and fine-tune the pre-trained model again. Please let me know if there is anything changed.

Sorry, the performance doesn't improve, still around 55 for pre-trained model on test set. And I also try to re-pre-train the backbone, also seems no improvement.

屏幕快照 2022-07-31 下午10 00 25
The validation accuray curve during pre-training is like this. Is this the same as yours? The curve can be seen from the logdir using command "tensorboard --logdir /path_to_logdir". If you do not have tensorboardX please install this package by "pip install tensorboardX".

屏幕快照 2022-07-31 下午10 00 25 The validation accuray curve during pre-training is like this. Is this the same as yours? The curve can be seen from the logdir using command "tensorboard --logdir /path_to_logdir". If you do not have tensorboardX please install this package by "pip install tensorboardX".

image
This is my accuracy curve of pre-trainig on val set, seems like it has a large margin

I don't know what's going on. Perhaps you could git the original repository again, and do not change anything except for the dataset path, then run the pre-training again. (which I have tried yesterday with a perfect match with the result)

it the original repository again, and do not change anything except for the dataset path, then run the pre-training again. (which I have tried yesterday with a perfect match with the result)

Thx, That may helps.

Hello, After I re-downloaded the file, everything goes well. Thanks for your patience!

Congratulations. By the way, setting model["task_classifier_params"] = {"learn_scale":False, "normalize":False} in set_config_meta_baseline_pretrain and model["normalize"] = False in set_config_meta_baseline_finetune could further improve performance.

Got it, Thx again.