[Bug] Deduplicate not working for large n!

Question

[Bug] Deduplicate not working for large n!

lluisp1999 opened this issue 6 months ago · comments

lluisp1999 commented 6 months ago

🐛 Bug

There is a bug in botorch.utils.multi_objective.pareto.is_non_dominated, where deduplicate is ignored whenever the size of the data is large enough.

To reproduce

** Code snippet to reproduce **

from botorch.utils.multi_objective.pareto import is_non_dominated
Y = torch.tensor([[1,2],[0,0],[2,1]])
NY = torch.tensor([[1,2],[0,0],[2,1]])
for i in range(3000):
    NY = torch.concat([NY,Y])
print(is_non_dominated(NY, deduplicate=False))

** Stack trace/error message **

Expected: tensor([ True, False,  True,  ...,  True, False,  True])
Got: tensor([ True, False,  True,  ..., False, False, False])
(i.e., deduplicate=False is ignored

Expected Behavior

When the data is large, the function is_non_dominated calls is_non_dominated_loop. The problem is that this function ignores deduplicate=False

Sam Daulton · Answer 1 · Wed Feb 14 2024 01:32:23 GMT+0800 (China Standard Time)

Thanks for flagging this. We should alert the user that data will be deduplicated automatically for large n.

Do you have a use case where you'd like keep duplicate points for large n? We could potentially add that functionality if needed

lluisp1999 · Answer 2 · Wed Feb 14 2024 17:16:18 GMT+0800 (China Standard Time)

The utility lays whenever we don't want to know the pareto front but rather seeing what points are not dominated. It has some great use-cases in many fields, particularly in GFlowNets in my case. I think it would be nice to have, or at least not to misslead the users. Thanks!