groupBy doesn't work as expected

Question

groupBy doesn't work as expected

geelen opened this issue 10 years ago · comments

Just tripped myself up trying to use groupBy. Here's the API I was expecting:

F.groupBy(function(a) { return a.toUpperCase(); }, "aAbbBBcc")
// TypeError: r is not a function

As in, I assumed it was a map then a group. But it actually needs a comparator, not a mapper:

F.groupBy(function(a,b) { return a.toUpperCase() === b.toUpperCase(); }, "aAbbBBcc")
// TypeError: b is undefined

Adding in some console.logs, I realised that the function is getting called with [lastElement, undefined] and then [undefined, undefined] after being called with each pair of values, so I added a check:

F.groupBy(function(a,b) { return a && b && a.toUpperCase() === b.toUpperCase(); }, "aAbbBBcc")
// TypeError: r is not a function

Had a look into the source, realised group uses groupBy(F.equal) under the hood, and F.equal is curried. So I realised what was happening

F.groupBy(F.curry(function(a,b) { return a && b && a.toUpperCase() === b.toUpperCase(); }), "aAbbBBcc")
// Array [ "aA", "bbBB", "cc" ]

So, at the very least, I think a correct usage of groupBy should be in the documentation and the tests as a reference, but I'm wondering whether this API could be different. I think a groupBy that takes a mapping function makes more sense, or at least it should be possible to pass a non-curried comparator to it? And maybe it shouldn't be passing undefineds at the end?

Thoughts?

Scott Sauyet · Answer 1 · Mon Nov 10 2014 22:20:51 GMT+0800 (China Standard Time)

If it helps clarify the thinking, Ramda ran into similar issues a few months ago, and decided on a naming convention: groupBy, sortBy, and in general, *By each use a unary function that generates a key to stand for the objects. The key can be as simple as get('foo'), or as complex as desired. But unionWith, uniqWith, and in general, *With, each use a binary function used to compare two elements in some way, usually as a predicate or in order to choose one of them.

We've not generally found the need for both at once. So we have groupBy but not groupWith, and unionWith but not unionBy, but there is nothing preventing us from doing both if they seem useful.

Thus for Ramda,

// Ramda deals mostly in lists, and doesn't know anything about strings.
R.groupBy(R.toUpperCase, ['a', 'A', 'b', 'b', 'B', 'B', 'c', 'c']);
//=> {A: ["a","A"], B:["b","b","B","B"], C:["c","c"]}

R.unionWith(function(a, b) {return a.height == b.height && a.width == b.width;}, 
    [{height: 10, width: 5}, {height: 3, width: 7, id: "x"}, {height: 6, width: 4}],
    [{height: 3, width: 7, id: "y"}, {height: 7, width: 8}]
)
//=> [{height: 10, width: 5}, {height: 3, width: 7, id: "x"}, 
//    {height: 6, width: 4}, {height: 7, width: 8}]

In writing this, though, I've looked back at some Ramda functions to realize that we've not been as consistent as I thought we were being. I'm off to raise an issue on the Ramda list....

Glen Maddern · Answer 2 · Tue Nov 11 2014 07:56:58 GMT+0800 (China Standard Time)

Yeah I like returning the mapped values in the object too, but then iterating an object doesn't have a deterministic order so it's not quite the same. Since groupBy won't return empty groups, you can just re-run the mapper over the first item of each group to get the key.

Scott Sauyet · Answer 3 · Tue Nov 11 2014 11:50:17 GMT+0800 (China Standard Time)

@geelen But does the order of the returned data really help you much? An array does have slightly more information than an object, in that it allows you to distinguish between cases like those below on the left, which could yield [["a", "A"], ["b", "b", "B", "B"], ["c", "c" ]] or {A: ["a", "A"], B: ["b", "b", "B", "B"], C: ["c", "c" ]} and those on the right, which could yield the object but not that particular array.

aAbcbBcB        aAcbbBBc
abAbBBcc        ccaAbbBB
abcAbBBc        acbAbBBc
abbAcBBc        cabcAbBB

While that is additional information, it seems information that you would expect to lose in a function named groupBy. Do you really care when grouping a collection in what order the first representative of each group appeared in the collection? For that is the only difference between the two.

But the main distinction I was trying to make was not in the output generated by the functions; Ramda could be modified to match FKit fairly easily. What I found important in this case is the input to the function. FKit requires a predicate that compares two elements against one another and reports whether they are equal. This is definitely the most generic possibility, and FKit might want to stick with it. Ramda notes that essentially every case we've seen for grouping can be based around something simpler: an extracted or generated key for each element. That is the only version Ramda exposes. If we found a need for the more general one, we would add a groupWith function as well.

The main difference is in ease of use.

F.groupBy(F.curry(function(a,b){return a && b && a.toUpperCase()==b.toUpperCase()}), coll)
R.groupBy(function(a) {return a.toUpperCase()}, coll)

(Since toUpperCase is reified in Ramda, that can be made simpler still:

R.groupBy(R.toUpperCase, coll);

but that's really beside the point.)

It's that simpler usage I find compelling.

Glen Maddern · Answer 4 · Tue Nov 11 2014 12:02:29 GMT+0800 (China Standard Time)

Yeah, actually I really like FKit's groupBy is pairwise. It's ideal for grouping already-sorted input, in which case the order is good to be preserved. In my case, I have a list of transactions that I want to group by month, that come from the server pre-sorted. The current groupBy is working great, it just requires a curried comparator which took me a while to figure out :)

Scott Sauyet · Answer 5 · Tue Nov 11 2014 21:33:03 GMT+0800 (China Standard Time)

Ok. Note that Ramda's is also order-preserving within groups, just not between groups. I simply didn't see any rationale to assume that because the first instance of one group in the list came before the first instance of another group that the first group was therefore sorted ahead of the second one.

But I don't want to argue this. I meant only to offer a different perspective.

I'm in a funny situation here. I'm really enjoying watching FKit. It's the first library I've seen that seems to have the same API concerns that Ramda has. And I'm really glad to have others doing the same things.

I both want to offer suggestions based on Ramda's experience and want to watch FKit develop differently so that there are competing ideas and a healthier environment in this space. So I mostly nibble at the edges, responding to issues that are raised, but not digging in on my own.

But I am having a lot of fun watching.

Joshua Bassett · Answer 6 · Fri Nov 14 2014 07:13:33 GMT+0800 (China Standard Time)

Thanks for your input guys, I've been busy the past few days but rest assured I'm going to take the time to sit down and weight in on this 😄