PatrickZH / DeepCore

Code for coreset selection methods

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about uncertainty based implementation

Data-reindeer opened this issue · comments

Hi, Chengcheng Guo and Bo Zhao:

Thanks for your thorough research and clean codes. However, I have some questions about uncertainty based implementation.

As mentioned in the DeepCore paper, samples with lower confidence may have a greater impact on model optimization than those with higher confidence, and should therefore be included in the coreset. But the implementation here actually calculate the inverse scores of uncertainty.

Take entropy as an example, np.log(preds + 1e-6) * preds is the negative of the entropy, so np.argsort(scores)[::-1][:self.coreset_size] select the samples with low entropy (uncertainty). This confused me a lot, which shows inconsistant implementation with the statement in the paper. Is there some bugs in the implementation?

Data-reindeer

The code selects samples with large entropy. Code np.argsort(scores)[::-1][:self.coreset_size]select samples with smaller scores, where scores are negative of the entropy. Smaller score means larger entropy.

For example, there are two samples with predicted probability [0.1, 0.9] and [0.4, 0.6], respectively. We can calculate their entropies, which are $e_1=-0.1ln(0.1)-0.9ln(0.9)=0.325$ and $e_2=-0.4ln(0.4)-0.6ln(0.6)=0.673$. The algorithm prefers sample 2 rather than sample 1 (because of larger entropy). I think this is consistent with what is stated in the paper.

Thanks for your reply. I think there are some misunderstanding about function np.argsort()
I have tried the following codes
scores = np.array(range(10))
result = np.argsort(scores)[::-1]
result2 = np.argsort(scores)
and print the results
result = [9 8 7 6 5 4 3 2 1 0]
result2 = [0 1 2 3 4 5 6 7 8 9]
The np.argsort sorts the values in ascending order by default, and [::-1] returns the values in descending order. So it seems that the code np.argsort(scores)[::-1][:self.coreset_size] actually select samples with larger scores.

Hi, may I ask if this issue has been resolved? I agree with @Data-reindeer. It seems we are selecting "more confident" samples instead of "less confident" ones as mentioned in the report.