boostorg / graph

Boost.org graph module

Home Page:http://boost.org/libs/graph

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why the implementation of Lengauer-Tarjan uses std::deque for a bucket?

samolisov opened this issue · comments

I was going through the implementation of the Lengauer-Tarjan algorithm in BGL's domibator_tree.hpp and have found that there is a vector of std::deque to store buckets: vertices with the given semidominator.

std::vector< std::deque< Vertex > > buckets_;

I read the original paper as well as some explanations carefully and found nowhere that the order of procesding buckets[parent[w]] makes any sense. Moreover, I see the deque is used there just as a container, not as a backend for the std::queue adapter: the code pushes vertices at the back of the deque, walks through the deque with a corresponding iterators and then clears it. In my understanding, std::vector<std::vector<>> might be used there.

Could you tell me why the deque is used to store a bucket in the code? My idea is to eliminate an overhead for the vector reallocation during the repeatedly performed push_back() and clear() operations, so this is a technical thing only and has no impact on the correctness of the algorithm.

Thank you very much.

Yeah, that bucket type should be vector, not deque. In the early 2000s, there was a bit of a misguided preference for deque. Can't explain it otherwise.

Other performance improvements to consider:

  • Changing the type of buckets_ to boost::unordered_flat_map<Vertex, std::vector<Vertex>>, and instead of clearing the bucket, remove it from the hash map.
  • std::vector< Vertex > semi_, ancestor_, samedom_, best_; these are all the same size (number of vertices in g) and allocated separately, however they are accessed in the same pattern, e.g.:
            put(semiMap_, n, s);

            // 2. Calculation of n's dominator is deferred until
            // the path from s to n has been linked into the forest
            get(bucketMap_, s).push_back(n);
            get(ancestorMap_, n) = p;
            get(bestMap_, n) = n;

so changing from a "struct of vectors" to a "vector of structs" would probably improve cache efficiency. Not sure about samedom_, it doesn't seem to fit the pattern.

I recommend using Google's benchmark library to get your results. Please include a graph of the comparison in any pull request. Also, please don't do all the performance changes together in one PR -- better to do it one at a time so as not to hurt my brain. Thanks. :)

@jeremy-murphy Thank you very much for the answer and a lot of information and suggestion. I believe developing benchmarks with the google benchmark project ismay itself be a good first issue for one who would like to join boost development. They may be started from a set of "real" CFGs from the boost sources or some other applications. LLVM or even Bolt may help a lot to collect the CFGs. Then the benchmarks may be added to the test directory as google abseil does it for its benchmarks and run as a part of the test suite to allow anyone reproduce the results. After this we may change the implementation of the algorithm to pick the correct containers. This is my vision of this task.

That sounds essentially right. I wouldn't trouble yourself to get 'real' CFGs from LLVM for example, it should be sufficient to generate synthetic ones? If we add Google benchmark tests to the library, they will be optional, as we don't want to make that library a required dependency.
Start small, commit early, make a draft pull request as soon as you have something you want to discuss!