dart-lang / collection

The collection package for Dart contains a number of separate libraries with utility functions and classes that makes working with collections easier.

Home Page:https://pub.dev/packages/collection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

List.duplicates to find duplicate elements in a List or Iterable

jonasfj opened this issue · comments

It would be nice to have a .duplicates method that given a List returns the Set of elements that appear more than once in the List.

My particular use-case was to check if a configuration list had duplicates.
If so, throw a FormatException.

It's easy to check for duplicates with list.length != list.toSet().length.
But for good error messages, it's nice to have the list of duplicates.

It could be something like:

extension<T> on List<T> {
  /// Return elements that appear more than once in this [List].
  Set<T> duplicates() {
    final duplicates = <T>{};
    final N = length;
    for (var i = 0; i < N; i++) {
      final candidate = this[i];
      for (var j = i + 1; j < N; j++) {
        if (candidate == this[j]) {
          duplicates.add(candidate);
          break;
        }
      }
    }
    return duplicates;
  }
}

It could probably be written more efficiently. Maybe, one could sort by hashCode or something, but it's not obvious that allocating a new list and sorting is any faster than doing the naive O(n²) thing. Maybe, if T is an object more complex than String or int, it might make sense to sort and being smart.

Open questions:

  • Should this operate on List<T> or Iterable<T>?
    (Maybe, Iterable<T> if we are to allocate and list for sorting, anyways)
  • Should this be an extension method or a top-level function?
    (maybe, a function is fine, it's not exactly something you'll frequently need)
  • Should this accept an Equality as an optional argument?
  • Should this return a List<T>, Set<T> or Iterable<T>?
    (maybe, Iterable<T> offers best flexibility if using a sort to find duplicates)

This is fairly specialized.

Removing duplicates makes sense.
That's what toSet does.

Counting occurrences makes some sense, of you want to account for all occurrences, but want to handle sale the dollars together, instead of in the original list order. Can use groupBy for that.

Handling only the ones that have duplicates, and not the rest, seems more uncommon.

I'd just tell people to do

list.groupBy(id).values.where((e) => e.length > 1)

This is a pretty decent solution!

But hard to discover. Maybe, I'll submit an example in documentation for the groupBy function :D