yaoyao-liu / meta-transfer-learning

TensorFlow and PyTorch implementation of "Meta-Transfer Learning for Few-Shot Learning" (CVPR2019)

Home Page:https://lyy.mpi-inf.mpg.de/mtl/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questian about the choice of base-learner

Sebastian-X opened this issue · comments

You used ResNet-12 as base-learner, and it's also a common choice in recent works. Does it mean that ResNet-12 is a super efficient model for few-shot learning? Is there any paper talks about it? I go throw your paper's related citations, but don't really find any information about this.
Also I see you deployed a ResNet version MAML during experiments whose performance over took the original one's, did you just change the base-learner of MAML and remain other parts the same?

P.S. I like your paper, really intriguing.
{S}$YVSX7~(LOQ78_D@D`I8

Thanks for your interest in our work.

Answer to Q1: ResNet-12 is an example of deeper networks compared to 4CONV. It is not the most efficient network architecture. I use ResNet-12 in my paper for fair comparisons with the related works. If you'd like to read some papers on the network architecture of few-shot learning, I suggest this one: A Closer Look at Few-shot Classification.

Answer to Q2: We have provided ablative results for MAML on ResNet-12 in the paper. However, it is not the result in the image you attached. The result in the image is for the "MAML+HT" setting, where HT meta-batch is applied. You may find the details in the paper.

If you have any further questions, feel free to add comments.

Thanks for your response!
I see your paper's ablation experiments, but there's still a point I don't understand. If I did't get it wrong, during meta-transfer learning phase, parameters of feature extractor are fixed while parameters of FC and SS layers are updated. However, the last 2 rows of this table shows the results of SS[Θ4;θ] and SS[Θ;θ] whose notation seems to indicate that parameters of feature extractor Θ4/Θ are also fine tuned. I'm a little confused about this, and if i misunderstood this table could you please tell me the difference between SS[Θ4;θ] and SS[Θ;θ]?
@1X$ 8C4PVG(~H9HM(F1VPT

SS[Θ;θ] means that we update SS weights for all convolutional layers Θ and the last fully-connected layer θ;
SS[Θ4;θ] means that we update SS weights for the 4th residual block Θ4 of ResNet-12 and the last fully-connected layer θ.

The details are available in the extended version: https://arxiv.org/pdf/1910.03648.pdf

I see. Thank you very much!