khanrc / swad

Official Implementation of SWAD (NeurIPS 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to draw flatness curve in Figure 3?

FrankZhangRp opened this issue · comments

Hi,
Thank you so much for providing this repo, the work is awesome!
And how can we reproduce the loss gap curve in Figure 3 of this paper? How to add the gamma on the model parameter and what is the metric of the distance in X-axis? I flat the model parameter dict into one vector and add a noise vector with norm 1.0 and get the loss gap about 0.2 on p domain test, I must have made a mistake on the Monte-Carlo approximation sampling.
Thanks a lot!

Hi, thanks to the interest in our study.

We first sample an unit direction vector and compute the loss gap by changing the model parameter according to the radius gamma. The parameter difference can be computed by gamma * unit_direction_vector. The reported value is averaged over 100 sampled direction vectors. X-axis indicates the gamma.

Simple pytorch-style pseudo code is:

n_params = num_parameters(model)
direction_vector = torch.randn(n_params)
unit_direction_vector = direction_vector / torch.norm(direction_vector)
for gamma in gamma_list:
  noised_model = get_noised_model(model, unit_direction_vector * gamma)
  loss_gap = evaluate(noised_model) - evaluate(model)

got it! Very clear! Thanks a lot!

got it! Very clear! Thanks a lot!

The loss gap I get seems to be wrong. Did you solve this problem?

commented

about Figure 3 plotting mentioned here

  1. Is the model used in plotting figure 3, the final converged model , or the model during training?
  2. are all the parameters of every layer added with weight noise?
    @khanrc

@brisker

  1. Three converged models are used. In particular, the models are converged before 1000 steps (See Fig. 5), and models from 2500, 3500, 4500 steps are used.
  2. Yes.