tabular data/ noisy instances

Question

nazaretl opened this issue 2 years ago · comments

Hi,
thanks for sharing your implementation. I have two questions about it:

Does it also work on tabular data?
Is it possible to identify the noisy instances (return the noisy IDs or the clean set)?

Thanks!

Hrayr Harutyunyan · Answer 1 · Tue May 10 2022 07:42:53 GMT+0800 (China Standard Time)

Yes, it is applicable to tabular data, but I guess you would need to change the network architecture.
Yes, in the paper we demonstrated one way of identifying noisy examples -- you need to rank the examples by the norm of the difference between predicted and actual gradients. Please see the examine_model function in https://github.com/hrayrhar/limit-label-memorization/blob/master/notebooks/visualize-results.ipynb.

Hrayr

nazaretl · Answer 2 · Tue May 10 2022 21:30:21 GMT+0800 (China Standard Time)

many thanks for the explanation!

Lusiné