Zac-HD / shed

`shed` canonicalises Python code. Shed your legacy, stop bikeshedding, and move on. Black++

Home Page:https://pypi.org/project/shed/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Replacement of `reversed(sorted(x))` with `sorted(x, reverse=True)` changes stability of sort.

DRMacIver opened this issue · comments

Consider for example reversed(sorted(xs, key=lambda x: 0)). The original form will return the reverse of xs, while the refactored form will return xs, as timsort is a stable sort.

Unclear whether this matters, I just spotted it while working on #69 and thought I would raise it as it's a similar category of issue.

Yeah, let's exclude reverse= as well as key=.

Sorry, I think that's a different problem and my mention of key= is distracting. This problem can be seen with a vanilla sort for anything where stability of sort matters, it's just easier to demonstrate those with key based sorts, but e.g. you can see this difference with sorted(xs) where xs = [[], [], [], []] or anything else where identity of the individual elements matters. reversed(sorted(xs)) will reverse that list, but sorted(xs, reverse=True) will leave it in the original order.

I see, yeah... identity-sensitive sorting seems rare enough relative to naive use that I'm inclined to keep this unsafe refactoring until someone reports it causing a problem. At that point maybe we add a separate --unsafe option?

Hmm it's not really about identity though, it's about anything where you care about a stronger level of equality than x <= y and not x < y. e.g. consider the following:

from functools import total_ordering

@total_ordering
class CaseInsensitive:
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        if not isinstance(other, CaseInsensitive):
            return NotImplemented
        return self.value == other.value

    def __lt__(self, other):
        if not isinstance(other, CaseInsensitive):
            return NotImplemented
        return self.value.casefold() < other.value.casefold()

    def __repr__(self):
        return repr(self.value)


if __name__ == '__main__':
    xs = list(map(CaseInsensitive, [
        "FOO", "foo"
    ]))

    print(list(reversed(sorted(xs))))
    print(sorted(xs, reverse=True))

This will print:

['foo', 'FOO']
['FOO', 'foo']

In general, sort stability is considered an important feature of the sorting API, and it feels weird to casually violate that just because it usually doesn't matter much, especially given how subtle the bugs where it matters are.