scooby / centered-heap

A heap implementation that is symmetric around a center and can move within an array. The amortized performance is equivalent to a standard heap.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The centered heap

It had been bugging me that an in-place merge sort was so impractical.

So I wrote this up, went down the rabbit hole, and devised a way to make a movable heap. It does work, my numbers show it is reasonably scalable.

But I don't want to oversell it: this is not better than a simple heap sort.

What you get are in-place merges with constant memory requirements and amortized O(log n) operations. We'll get into the fine print of what "amortized" means below.

Revisiting a standard heap

A standard heap is a binary tree:

           root
         /      \
    node          node
   /    \
node   node   ---    ----

It then adds two constraints:

  • A parent node is always "better than" the children
  • Trailing nodes can be empty

Because all parent nodes are "better than" their children, the "best" node is always available at the root of the heap.

Because trailing nodes can be empty, a heap can be stored in an array, and all heap operations can be performed inline within the array.

But only the heap's tail can move:

[     heap             |       other data               ]
 ^root                 ^tail

This is excellent for priority queues and heap sorts, since the array is dedicated solely to that task.

What a centered heap changes

A centered heap is still a binary tree, but we've changed the rules about trailing nodes:

          center
         /      \
    node          node
        \        /
----   node   node    ----

And we add three markers low, center and high.

The center points at (is) the root, and the low and high markers note where the heap begins and ends.

Thus we have something like this:

[  other data  |      heap         |    other data     ]
            low^        ^center    ^high

The rules for child nodes are a little different from heaps. We're adjusting our indices to treat the center as 0 and allowing for negative and positive children:

-4 -3 -2 -1  0  1  2  3  4
             c
          n     n
    n  n           n  n

We further expand the common pop and push to have left and right variants.

Push left:
    . . I [       C     ] . . .
    . . [         C     ] . . .
I is now pushed into the heap, and the low bound moves left.

Push right:
    . . . [       C     ] I . .
    . . . [       C       ] . .
I is now pushed into the heap, and the high bound moves right.

Pop left:
    . . [         C     ] . . .
    . . C [       C'    ] . . .
C is popped to the left, and the next best C' is now at the center.

Pop right:
    . . . [       C       ] . .
    . . . [       C'    ] C . .
C is popped to the right, and the next best C' is now at the center.

There are other functions combining pops and pushes and sliding the whole heap, but those are the basics.

Recentering and amortized performance

When you pop past the center, it's no longer valid. Thus we have to "recenter" the heap, which is equivalent to the "heapify" function of standard heaps. (It issues a number of sift-down operations if you're familiar with how heaps are implemented.)

For most algorithms, though, we can be smart about how we recenter. We've typically shown the center to be in, well, the center, but there's no requirement for that. Consider:

. . . P1 P2 P3 [C            ] . . . .

We've been popping to the left, and are about to invalidate our center. We should thus recenter all the way to the right:

. . . P1 P2 P3 P4 [         C] . . . .

If our c-heap has n elements, we'll run n log n sift-downs. If we're pushing and popping in the same direction, we'll tend to recenter every n elements for the size of our window.

Thus after we amortize recentering for a c-heap that is typically n elements large, the cost is about one sift-down (log n operations) per pop.

So a pop amortizes to log n operations for the recenter, plus the constant cost of log n operations for the pop itself.

And a push never recenters, so it's log n operations for the push.

The demo utility

Part of my motivation was to learn a bit of Rust, so I apologize for the beginner code. Hopefully nothing here will frighten the horses.

It has a simple CLI that lets you build an array, then pick an operation, and you can gather stats on compares and swaps if you like. You can also use -o Sort to compare how laughably slow it is compared to the native Vec.sort() method.

The debug target spams a lot of information. It's also running many redundant invariant checks that --release turns off.

I haven't exercised all the methods of Cheap yet, so, except for the checks, those annotated with #[allow(dead_code)] are likely buggy.

If you've never used Rust, you'll need to install it, and you should be able to do a simple:

cargo run -- --help
cargo run --release -- --help

Does it really scale?

I ran some tests with the merge sort and giving it a reversed array, which is a pretty good worst case.

The growth in the number of compares and swaps per element is trending downward, so I'm not sure if the overall complexity is strictly O(n log n), but it's also not unreasonable.

Size Compares Swaps Compares / N Swaps / N Comp grow Swap grow
4 9 6 2.25 1.5 - -
8 32 22 4 2.75 1.778 1.833
16 118 74 7.38 4.63 1.845 1.684
32 402 230 12.56 7.19 1.702 1.553
64 1244 664 19.44 10.38 1.548 1.444
128 3594 1810 28.08 14.14 1.444 1.362
256 9840 4722 38.44 18.45 1.369 1.305
512 25771 11922 50.33 23.29 1.309 1.262
1024 65361 29320 63.83 28.63 1.268 1.229
2048 161603 70644 78.91 34.49 1.236 1.205
4096 391421 167500 95.56 40.89 1.211 1.186
8192 931695 391730 113.73 47.82 1.19 1.169
16384 2184531 904656 133.33 55.22 1.172 1.155
32768 5061357 2067486 154.46 63.09 1.158 1.143
65536 11608747 4684948 177.14 71.49 1.147 1.133
131072 26382571 10538678 201.28 80.4 1.136 1.125
262144 59484663 23540708 226.92 89.8 1.127 1.117
524288 133209364 52266700 254.08 99.69 1.12 1.11
1048576 296540917 115459604 282.8 110.11 1.113 1.105
2097152 656577504 253900734 313.08 121.07 1.107 1.1
4194304 1446238418 555798460 344.81 132.51 1.101 1.094
8388608 3171292780 1211666326 378.05 144.44 1.096 1.09
16777216 6926010907 2632056058 412.82 156.88 1.092 1.086

About

A heap implementation that is symmetric around a center and can move within an array. The amortized performance is equivalent to a standard heap.


Languages

Language:Rust 100.0%