gsjaardema / blipsort

Branchless-Lomuto in Pattern-Defeating Quicksort

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Branchless Lomuto in Pattern-Defeating Quicksort (Blipsort)

A highly-optimized (and tiny) Introsort variant that draws from PDQsort, Java, and Orson Peters' and Lukas Bergdoll's branchless Lomuto partitioning. Iterative version Here.

Speed

Speed

clang 16, -O3

Complexity

Best Average Worst Memory
n n log n n log n log n

Visualization

The.Sound.of.Sorting.0.6.5.-.http___panthema.net_2013_sound-of-sorting.Ubuntu-22.04.2024-02-18.20-59-40.mp4
Blipsort, ported into the Sound of Sorting program by Timo Bingmann

Techniques

Branchless Lomuto

The decades-old partitioning algorithm recently made a resurgence when researchers discovered ways to remove the inner branch. Orson Peters' and Lukas Bergdoll's method— published under two months ago— is the fastest yet. It employs a gap in the data to move elements twice per iteration rather than swapping them (three moves). For arithmetic and pointer types, Blipsort employs branchless Lomuto partitioning. For other, larger types, Blipsort uses branchful Hoare partitioning.

Pivot Selectivity

Blipsort carefully selects the pivot from the middle of five sorted candidates. These candidates allow the sort to determine whether the data in the current interval is approximately descending and inform its "partition left" strategy.

Introspection

Blipsort is introspective, switching to a guaranteed nlog(n) sort if it becomes quadratic. Like PDQsort, Blipsort switches to Heapsort after log(n) "bad" partitions— partitions that are significantly unbalanced.

Insertion Sort

Blipsort uses Insertion sort on small intervals where asymptotic complexity matters less and instruction overhead matters more. Blipsort employs Java's Pair Insertion sort on every interval except the leftmost. Pair insertion sort inserts two elements at a time and doesn't need to perform a lower bound check, making it slightly faster than normal insertion sort in the context of quicksort.

Pivot Retention

Similar to PDQsort, if any of the three middlemost candidate pivots is equal to the rightmost element of the partition at left, Blipsort moves equal elements to the left with branchless Lomuto and continues to the right, solving the dutch-flag problem and yeilding linear time on data comprised of equal elements.

Optimism

Similar to PDQsort, if the partition is "good" (not highly unbalanced), Blipsort switches to insertion sort. If the Insertion sort makes more than a constant number of moves, Blipsort bails and resumes quicksort. This allows Blipsort to achieve linear time on already-sorted data.

Breaking Patterns

Like PDQsort, if the partition is bad, Blipsort scrambles some elements to break up patterns.

Rotation

When all of the candidate pivots are strictly descending, it is very likely that the interval is descending as well. Lomuto partitioning slows significantly on descending data. Therefore, Blipsort neglects to sort descending candidates and instead swap-rotates the entire interval before partitioning.

Custom Comparators

Blipsort allows its user to implement a custom boolean comparator. A comparator is best implemented with a lambda and no branches. A comparator implemented with a lambda can be inlined by an optimizing compiler, while a comparator implemented with a constexpr/inline function typically cannot.

Sources

Here is the PDQsort algorithm by Orson Peters

Here is the branchless Lomuto blog post by Orson Peters and Lukas Bergdoll

Here is Java's Dual Pivot Quicksort

About

Branchless-Lomuto in Pattern-Defeating Quicksort

License:MIT License


Languages

Language:C++ 100.0%