pytorch / botorch

Bayesian optimization in PyTorch

Home Page:https://botorch.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Deduplicate not working for large n!

lluisp1999 opened this issue Β· comments

πŸ› Bug

There is a bug in botorch.utils.multi_objective.pareto.is_non_dominated, where deduplicate is ignored whenever the size of the data is large enough.

To reproduce

** Code snippet to reproduce **

from botorch.utils.multi_objective.pareto import is_non_dominated
Y = torch.tensor([[1,2],[0,0],[2,1]])
NY = torch.tensor([[1,2],[0,0],[2,1]])
for i in range(3000):
    NY = torch.concat([NY,Y])
print(is_non_dominated(NY, deduplicate=False))

** Stack trace/error message **

Expected: tensor([ True, False,  True,  ...,  True, False,  True])
Got: tensor([ True, False,  True,  ..., False, False, False])
(i.e., deduplicate=False is ignored

Expected Behavior

When the data is large, the function is_non_dominated calls is_non_dominated_loop. The problem is that this function ignores deduplicate=False

Thanks for flagging this. We should alert the user that data will be deduplicated automatically for large n.

Do you have a use case where you'd like keep duplicate points for large n? We could potentially add that functionality if needed

The utility lays whenever we don't want to know the pareto front but rather seeing what points are not dominated. It has some great use-cases in many fields, particularly in GFlowNets in my case. I think it would be nice to have, or at least not to misslead the users. Thanks!