elm-community / list-extra

Convenience functions for working with List.

Home Page:http://package.elm-lang.org/packages/elm-community/list-extra/latest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

groupWhile behaves like groupWhileTransitively

opened this issue · comments

The docs for List.Extra.groupWhileTransitively state that it will "Start a new group each time the comparison test doesn't hold for two adjacent elements".

However, this is the behaviour that List.Extra.groupWhile exhibits, which does not seems consistent with its docs.

As far as I can tell the two functions behave identically, and the library provides no way to group together matching elements regardless of their position.

The code

module Main exposing (..)

import Html
import List.Extra

main =
    [ 0, 1, 1, 0, 0, 2, 3, 1 ]
        |> List.Extra.groupWhile (==)
        |> toString
        |> Html.text

Outputs [[0],[1,1],[0,0],[2],[3],[1]] rather than [[0,0,0],[1,1,1],[2],[3]]

As far as I can tell the two functions behave identically

The documentation of the package gives examples where the two functions do not behave identically!

Sorry.
"Within the context of "Starting a new group each time the comparison test doesn't hold for two adjacent elements", the two functions behave identically.

No. Or, rather: What exactly do you mean? In what sense are the two examples groupWhile (<) [1,2,3,2,4,1,3,2,1] and groupWhileTransitively (<) [1,2,3,2,4,1,3,2,1], which demonstrably behave differently, not in that context?

The documentation of a function variant states "Start a new group each time the comparison test doesn't hold for two adjacent elements", while the documentation of the original version of the function does not.

Because of this, by reading the documentation, one may be lead to think that the statement above describes a difference in behaviour between the original function and its variant, but this does not seem to be the case.

In other terms: if both functions "Start a new group each time the comparison test doesn't hold for two adjacent elements", why is the statement included only in the variant, rather than in the original?

Because it DOES NOT HOLD for the original version. Look at the example given for groupWhile with the comparison test (<). There, the test does not hold for the adjacent elements in the third and fourth positions, namely 3 and 2. After all, 3<2 does not hold. And yet, the groupWhile function does not start a new group at that point.

You may also want to consider that the documentation around there deliberately talks about "equality test" for some things, and "comparison test" for other things.

It does not hold for a specific case that's described as "non-intuitive", not for what seems to be described as the main use case.

Look, I'm sure YOU understand the documentation, but the point of documentation is to make a library understood by people who don't yet understand it.

As seen by someone new to that function, the documentation is confusing at best, misleading at worst.

Maybe I'm just a special case of cognitively impaired newbie, in which case this issue can be closed.
But please do consider how the documentation reads to someone who has never seen it before and who is not familiar with functional programming.

Can you tell what the documentation should say instead or additionally to reduce or remove confusion?

Since both, saying that the two functions behave equally, or adding the sentence you quoted from the second function's documentation to the first function's documentation, would be wrong statements.

So at the moment I still do not see anything actionable here.

Well, maybe, in the first function's documentation: "Start a new group each time the test doesn't hold for the first element of the current group and the next element from the list being processed." Would that help?

Yes, that would definitely help.

I can't tell what the documentation should say because I don't think I understand yet what the functions are supposed to do.
I'll try and study the code as soon as possible, and see if I can come out with some text that makes sense to me.

I had a look at the code.
The impression I have is that groupWhile works as one would expect only for commutative operators, while groupWhileTransitively will work as one would expect on both commutative and non-commutative operators.

In fact, both functions will produce the same result when given the input in groupWhile's doc example:

commutativeComparison =
  (\x y -> Tuple.first x == Tuple.first y)

aList =
  [ ( 0, 'a' ), ( 0, 'b' ), ( 1, 'c' ), ( 1, 'd' ) ]

thisIsTrue =
  List.Extra.groupWhile commutativeComparison aList == List.Extra.groupWhileTransitively commutativeComparison aList

Given this, what's the point of maintaining groupWhile at all?
Why maintain a function that has some weird behaviour while we have another that behaves intuitively in all cases?
I'm not saying there is no rationale, I'm saying that I don't see it.
I think understanding the rationale will help write better documentation.

It is possible that not both functions are needed. A similar discussion happened in #2, though the situation there is different in that there one is really always concerned with an equivalence relation, whereas for groupWhile consideration of order relations makes sense.

Here is two possible reasons why having both the current groupWhile and groupWhileTransitively may be useful:

  • Maybe someone really wants the behavior that the current groupWhile shows on order relations like < and that is not available via calls to the current groupWhileTransitively.
  • Maybe the current groupWhile can be (or already is) implemented more efficiently than the current groupWhileTransitively. Then it would make sense to have groupWhile available for cases where it applies (i.e., where the implicit assumptions on the element-wise operation passed are satisfied), but to be able to fall back to groupWhileTransitively in other cases (i.e., where only weaker assumptions hold).

While both possible reasons are sensible, if we are to use them to write documentation we need the actual reasons.

A git blame pops up a few different authors.
Maybe @abadi199, @stil4m or @sindikat can chime in?

In response to @jvoigtlaender's two bullet points.

  1. I can't imagine a scenario where someone would want the unexpected behavior of groupWhile for a non-equivalence relation. Even if they did, they should not be relying on that behavior of groupWhile because the docs say it is unexpected implying that it could change in the future.

  2. I think that the confusion caused by having these two very similar functions outweighs the minor performance benefits that can be made. If someone is that concerned about performance, then I think it is reasonable for them to implement their own solution.

In summary, I'm in favor of removing the current groupWhile, renaming groupWhileTransitively to groupWhile, and perhaps providing a separate implementation for group under the assumption that it is dealing with an equivalence relation.

Is there no built in function to group independently of order?

The way I solved the order problem is to sort the items in the list first with "sortBy", then use "groupWhile". May not work for some situations though.

I am just coming upon this problem since we are updating groupWhile for the next major update. I also was really confused when I noticed theres no practical difference between these functions, and only an un-intuitive technical difference.

Deleting groupWhile, and renaming groupWhileTransitively to groupWhile seems like the best course of action. Do you still disagree with that @jvoigtlaender ?

Here is a new implementation of groupWhile, that has the old groupWhileTransitively behavior, tail call optimization, and a non-empty list type. I would appreciate any feedback about it, including regarding the implementation and whether it should be this way at all: https://ellie-app.com/32KL9L2NtPKa1

I had to re-read my comments from more than one year ago. In them, I do not find that I was ever against removing a function, nor actually that I was ever for or against any specific action here (except the ones where something factually wrong would have ended up in the documentation). I was only trying to clarify what the functions as they were in the package were actually doing (since there were apparent misconceptions from some other commentors), etc.

I didn’t voice an opinion either way (keeping both functions or not) and neither do I have one now.

This matter is settled and the changes I mentioned above were published, so I am closing this issue.