How to draw flatness curve in Figure 3?

Question

How to draw flatness curve in Figure 3?

FrankZhangRp opened this issue 2 years ago · comments

Hi,
Thank you so much for providing this repo, the work is awesome!
And how can we reproduce the loss gap curve in Figure 3 of this paper? How to add the gamma on the model parameter and what is the metric of the distance in X-axis? I flat the model parameter dict into one vector and add a noise vector with norm 1.0 and get the loss gap about 0.2 on p domain test, I must have made a mistake on the Monte-Carlo approximation sampling.
Thanks a lot!

Junbum Cha · Answer 1 · Mon May 02 2022 17:52:43 GMT+0800 (China Standard Time)

Hi, thanks to the interest in our study.

We first sample an unit direction vector and compute the loss gap by changing the model parameter according to the radius gamma. The parameter difference can be computed by gamma * unit_direction_vector. The reported value is averaged over 100 sampled direction vectors. X-axis indicates the gamma.

Simple pytorch-style pseudo code is:

n_params = num_parameters(model)
direction_vector = torch.randn(n_params)
unit_direction_vector = direction_vector / torch.norm(direction_vector)
for gamma in gamma_list:
  noised_model = get_noised_model(model, unit_direction_vector * gamma)
  loss_gap = evaluate(noised_model) - evaluate(model)

ZhangRuipeng · Answer 2 · Mon May 02 2022 17:55:17 GMT+0800 (China Standard Time)

got it! Very clear! Thanks a lot!

Wang-pengfei · Answer 3 · Wed Oct 26 2022 10:36:19 GMT+0800 (China Standard Time)

got it! Very clear! Thanks a lot!

The loss gap I get seems to be wrong. Did you solve this problem?

cccpr · Answer 4 · Fri Nov 18 2022 14:10:19 GMT+0800 (China Standard Time)

about Figure 3 plotting mentioned here

Is the model used in plotting figure 3, the final converged model , or the model during training?
are all the parameters of every layer added with weight noise?
@khanrc

Junbum Cha · Answer 5 · Fri Nov 18 2022 20:49:45 GMT+0800 (China Standard Time)

@brisker

Three converged models are used. In particular, the models are converged before 1000 steps (See Fig. 5), and models from 2500, 3500, 4500 steps are used.
Yes.