[Bug] Deduplicate not working for large n!
lluisp1999 opened this issue Β· comments
π Bug
There is a bug in botorch.utils.multi_objective.pareto.is_non_dominated, where deduplicate is ignored whenever the size of the data is large enough.
To reproduce
** Code snippet to reproduce **
from botorch.utils.multi_objective.pareto import is_non_dominated
Y = torch.tensor([[1,2],[0,0],[2,1]])
NY = torch.tensor([[1,2],[0,0],[2,1]])
for i in range(3000):
NY = torch.concat([NY,Y])
print(is_non_dominated(NY, deduplicate=False))
** Stack trace/error message **
Expected: tensor([ True, False, True, ..., True, False, True])
Got: tensor([ True, False, True, ..., False, False, False])
(i.e., deduplicate=False is ignored
Expected Behavior
When the data is large, the function is_non_dominated calls is_non_dominated_loop. The problem is that this function ignores deduplicate=False
Thanks for flagging this. We should alert the user that data will be deduplicated automatically for large n
.
Do you have a use case where you'd like keep duplicate points for large n
? We could potentially add that functionality if needed
The utility lays whenever we don't want to know the pareto front but rather seeing what points are not dominated. It has some great use-cases in many fields, particularly in GFlowNets in my case. I think it would be nice to have, or at least not to misslead the users. Thanks!