Questions about uncertainty based implementation

Question

Questions about uncertainty based implementation

Data-reindeer opened this issue a year ago · comments

Hi, Chengcheng Guo and Bo Zhao:

Thanks for your thorough research and clean codes. However, I have some questions about uncertainty based implementation.

As mentioned in the DeepCore paper, samples with lower confidence may have a greater impact on model optimization than those with higher confidence, and should therefore be included in the coreset. But the implementation here actually calculate the inverse scores of uncertainty.

Take entropy as an example, np.log(preds + 1e-6) * preds is the negative of the entropy, so np.argsort(scores)[::-1][:self.coreset_size] select the samples with low entropy (uncertainty). This confused me a lot, which shows inconsistant implementation with the statement in the paper. Is there some bugs in the implementation?

Data-reindeer

Chengcheng-Guo · Answer 1 · Mon Apr 24 2023 23:06:27 GMT+0800 (China Standard Time)

The code selects samples with large entropy. Code np.argsort(scores)[::-1][:self.coreset_size]select samples with smaller scores, where scores are negative of the entropy. Smaller score means larger entropy.

For example, there are two samples with predicted probability [0.1, 0.9] and [0.4, 0.6], respectively. We can calculate their entropies, which are $e_1=-0.1ln(0.1)-0.9ln(0.9)=0.325$ and $e_2=-0.4ln(0.4)-0.6ln(0.6)=0.673$. The algorithm prefers sample 2 rather than sample 1 (because of larger entropy). I think this is consistent with what is stated in the paper.

Data-reindeer · Answer 2 · Tue Apr 25 2023 09:36:12 GMT+0800 (China Standard Time)

Thanks for your reply. I think there are some misunderstanding about function np.argsort()
I have tried the following codes
scores = np.array(range(10))
result = np.argsort(scores)[::-1]
result2 = np.argsort(scores)
and print the results
result = [9 8 7 6 5 4 3 2 1 0]
result2 = [0 1 2 3 4 5 6 7 8 9]
The np.argsort sorts the values in ascending order by default, and [::-1] returns the values in descending order. So it seems that the code np.argsort(scores)[::-1][:self.coreset_size] actually select samples with larger scores.

LIZEKAI · Answer 3 · Thu Apr 18 2024 14:09:03 GMT+0800 (China Standard Time)

Hi, may I ask if this issue has been resolved? I agree with @Data-reindeer. It seems we are selecting "more confident" samples instead of "less confident" ones as mentioned in the report.