edouarda / brigand

Instant compile time C++ 11 metaprogramming library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sort could be faster

odinthenerd opened this issue · comments

Ok I admit its my pet peeve ;) I find it interesting how combining algorithms are not efficient in runtime stuff but are efficient in metaprogramming. Basically the current implementation is good to about 500 elements, however even with the fast tracks merging 16 elements into 500 elements will be slow.

Coming back to the original partition implementation the problem was that a bad pivot element would cause a partition of everything only to eliminate one element from the input list. If we were to build up a large list using the merge/insertion sort, split it in the middle and then use that as a pivot at least we are eliminating half of the output list each partition even if we are not eliminating anything from the input list.

Another approach would be to sort chunks of lets say 64 elements each and then "partition"-ing them would essentially be a "split_if" rather than a normal partition. Since we need to presort the lists for merging any way we aren't really losing much and splitting a sorted list should be more efficient than a classic partition on all of its elements. This would essentially be a pure merge sort combined with a split_if.

I would really like to get to 1000 elements for kvasir.

@jonathanpoelen OMG your sort works up to several thousand elements! You are my hero!

I think the algorithm is taking most of its time in merge. Problem is that merge is hopelessly recursive, I think we need to do some partitioning to shorten merge runs.

I added a partition here https://github.com/edouarda/brigand/blob/master/brigand/algorithms/sort.hpp#L191 by splitting our 256 element sorted list and using the middle as a pivot element to partition the rest of the input as described above. The result is a 28% speed up on my machine. I think this is the right direction.

here is the fast tracked version https://github.com/porkybrain/brigand/tree/merge-partition-sort

on my machine 2000 !!! element list in random order
old : 12.81s
new : 6.67s

not sure yet if I broke any thing, there are a lot of moving parts in this algorithm.

Wow I wasn't aware that you had improved it that much. I tested metal several months ago and it was much slower but that sounds quite similar to brigand and, as most of your stuff is, solved in a puristic fashion. I think we can slim down the brigand algorithm quite a lot.

I wasn't aware that you had improved it that much

Since our last discussions metal's API became more stable, so I decided to investigate which algorithms would benefit most from fast tracking and it basically boiled down to join, fold and reverse (brunocodutra/metal#41).

sounds quite similar to brigand

Take it with a grain of salt until we compare results on the same machine, because we might be comparing apples to bananas here depending on how different our hardware and setups are, but at a first glance it looks like metal doesn't lag too much behind brigand, which makes me proud to be honest.

either you dev system is awesome or I'm doing something wrong, here is my code:

template<typename T, typename U>
using eager_less = meta::number<(T::value < U::value)>;
using a = typename metal::sort<my_list, metal::lambda<eager_less>>::type;

using brigand I get 0.55s for 300 elements and 15.26s for meta. Meta crashes my laptop at 400 elements, brigand goes to like 3000.

I'm thinking about a making a generic "chunker" which splits a list into smaller lists of a fixed size (like chunker16 or chunker256) which would allow us to fast track and still use generic fold and would save us from writing so damn many typenames in every algorithm.

Ok now I'm using a list of ransom metal::number and using a = metal::sort<my_list, metal::lambda<metal::less>> and the numbers are slightly worse than last run.

I have been blindly using a list ordered in reverse order as a worst case test for sort, but that obviously doesn't apply for merge sort, now that I take a closer look at it. Silly me, looks like I have been inadvertedly cheating :c

Curiously, that means merging is the culprit, so perhaps it deserves fast tracking. I'll see what I can lern from Brigand's.

Thanks for benchmarking btw!

no problem, @ldionne made the same benchmarking mistake a while back ;) fast tracking merge is pretty hard because it is essentially a proper fold, @jonathanpoelen's idea of making a kind of partial merge seems to be the key behind the last speed up and is a genius idea. splitting the merged list and using that element as a pivot in order to partition the rest of input seems to bring another speed up (can be seen in my unmerged branch) but I think that can be improved on further.

Nice to see attention to sort, it is really something I need done well for kvasir. I would love the hear original ideas!

BTW, which compiler are you using for these benchmarks?

Clang 3.7

So I was able to pin join as the culprit for the memory overflow. I even managed to rewrite it so as to reduce memory consumption by about 20-30% on gcc, but it is still prohibitively high.

If only there was a way to implement take directly just like one implements drop, divide-and-conquer algorithms wouldn't have to depend on join (nor on any linearly recursive algorithm for what's worth).

Finally, I decided to implement merge the dumbest way possible and, to my surprise, that's what happens

gcc 6.1 clang 4.0
sort_gcc sort_clang

I finally found time to look at your solution. I think one thing your doing that is faster than us is your merge specialization struct _merge<ret, list<xh, xt...>, list<yh, yt...>, lambda<expr>, if_<expr<xh, yh>, true_, false_>> where you know that the last parameter is going to be true_. The problem is that this restricts the concept of a predicate in that it must return a metal::number. Currently (as far as I know) brigand will accept anything that has a ::value as a predicate, which is far less restrictive. I wonder if our loose definition of this is causing a performance hit.

have not been able to put a whole lot of time into this but brigand::sort could benefit from chunking the input using a meta monad pattern as well as joining many lists rather than 2 at a time. Join can be implemented using more aliases and less complex types as well, see metals implementation of join for a reference. Also the nested alias version of conditional should help too.

Just wanted to keep the thought process going.