Copying a filtered collection with ImmutableSet.copyOf() should require visiting each element only once
asheldon opened this issue · comments
Description
Since Google Guava 33.1.0 in 5c4f5b2, using ImmutableSet to copy a filtered collection requires iterating over and applying the predicate function, along with any side effects, to each element twice. The expected number of times is once per element.
This change message seems to indicate this behavior was understood by the author.
I could imagine bad performance for users who call
ImmutableSet.copyOf(Collections.filter(...))
. If that's an issue in practice, then maybe we should insert a special case for it insidecopyOf
. I at least made sure to callsize()
only once.
I believe the code should handle this case if only to avoid unnecessary executions of the filter function which may negate any potential allocation savings. If this change cannot be made, please consider replacing size()
with isEmpty()
. For collections where size()
is not constant time, isEmpty()
is almost always cheaper, including for filtered collections.
Example
import com.google.common.collect.Collections2;
import com.google.common.collect.ImmutableSet;
import java.util.ArrayList;
import java.util.List;
public class FilteredImmutableRepro {
public static void main(String[] unused) {
List<String> example = new ArrayList<>();
example.add("one");
example.add("two");
example.add("three");
ImmutableSet.copyOf(Collections2.filter(example, element -> {
System.out.println(element);
return true;
}));
}
}
Expected Behavior
one
two
three
Actual Behavior
one
two
three
one
two
three
Packages
com.google.common.collect
Platforms
No response
Checklist
- I agree to follow the code of conduct.
Huh, why didn't I use isEmpty()
there? Oh, I see: The "more significant optimization [that] may be coming in cl/590212390" would use size()
, and I didn't reconsider that when I carved off a less ambitious optimization to try first.
I can change to isEmpty()
, but it is possible that a future change will switch back to size()
. As you observed, the filtered-collection problem is a cost that I think we'd be willing to pay in general: We regret providing filtered collections, and we believe that the JDK folks took a better approach when they designed Stream
. ImmutableSet
is probably one of many APIs in Guava that both calls size()
and then iterates, so filtered collections are going to remain susceptible to this general kind of issue.
Still, there's no reason to use size()
today, even setting aside filtered-collection concerns. So I can change to isEmpty()
to at least buy you some time to consider taking further action. (I suspect that you could switch to ImmutableSet.copyOf(ImmutableList.copyOf(filteredCollection))
, but I do encourage moving as far away from filtered collections as you can get away with :))