google / guava

Google core libraries for Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Copying a filtered collection with ImmutableSet.copyOf() should require visiting each element only once

asheldon opened this issue · comments

Description

Since Google Guava 33.1.0 in 5c4f5b2, using ImmutableSet to copy a filtered collection requires iterating over and applying the predicate function, along with any side effects, to each element twice. The expected number of times is once per element.

This change message seems to indicate this behavior was understood by the author.

I could imagine bad performance for users who call ImmutableSet.copyOf(Collections.filter(...)). If that's an issue in practice, then maybe we should insert a special case for it inside copyOf. I at least made sure to call size() only once.

I believe the code should handle this case if only to avoid unnecessary executions of the filter function which may negate any potential allocation savings. If this change cannot be made, please consider replacing size() with isEmpty(). For collections where size() is not constant time, isEmpty() is almost always cheaper, including for filtered collections.

Example

import com.google.common.collect.Collections2;
import com.google.common.collect.ImmutableSet;

import java.util.ArrayList;
import java.util.List;

public class FilteredImmutableRepro {
    public static void main(String[] unused) {
        List<String> example = new ArrayList<>();
        example.add("one");
        example.add("two");
        example.add("three");

        ImmutableSet.copyOf(Collections2.filter(example, element -> {
            System.out.println(element);
            return true;
        }));
    }
}

Expected Behavior

one
two
three

Actual Behavior

one
two
three
one
two
three

Packages

com.google.common.collect

Platforms

No response

Checklist

Huh, why didn't I use isEmpty() there? Oh, I see: The "more significant optimization [that] may be coming in cl/590212390" would use size(), and I didn't reconsider that when I carved off a less ambitious optimization to try first.

I can change to isEmpty(), but it is possible that a future change will switch back to size(). As you observed, the filtered-collection problem is a cost that I think we'd be willing to pay in general: We regret providing filtered collections, and we believe that the JDK folks took a better approach when they designed Stream. ImmutableSet is probably one of many APIs in Guava that both calls size() and then iterates, so filtered collections are going to remain susceptible to this general kind of issue.

Still, there's no reason to use size() today, even setting aside filtered-collection concerns. So I can change to isEmpty() to at least buy you some time to consider taking further action. (I suspect that you could switch to ImmutableSet.copyOf(ImmutableList.copyOf(filteredCollection)), but I do encourage moving as far away from filtered collections as you can get away with :))