Possible performance optimizations

Question

Possible performance optimizations

meiersi opened this issue 13 years ago · comments

Hi chowells79,

a friend just pointed me to your library, as he was using it as a blueprint for implementing an Adaptive replacement cache. While investigating how your code works, I may have spotted a few possible performance optimizations for the pure version of the LRU cache:

It seems that the Maybes for first and last in the datatype definition of LRU are used to handle the case of an empty cache. If you introduce a second constructor Empty, then you could simplify your cache definition and gain performance due to fewer indirections.
You might want to consider using a circular doubly linked list. This way you would require only a pointer to first and the Maybes in LinkedValue could also be dropped. The last element of the list is then the one whose next key is equal to the first key of the cache. Again, getting rid of Maybes means getting rid of indirections and, thus, realizing a performance gain.
You might want to consider storing the size-bound in just an Int. For an unlimited size, you would store maxBound :: Int. The reason is that you will never be able to store more than maxBound :: Int elements in your LRU cache, as this would exceed the range of memory addressable by your CPU. Again this improves performance, as indirections and code complexity is reduced.
You do not require the property of Maps that they keep their keys in an ordered sequence. You could therefore switch to using a persistent hashtable (analogous Johan Tibell's unordered-containers package). I do not see a way to implement it using the public API of unordered-containers. The problem is that the value you store in the underlying IntMap depends on the hash of the key. The performance gain would stem from two sources: first, fewer indirections, because the next and prev "pointers" can then be fully unpacked in the LinkedVal and, second, hashing + IntMap operations are faster than using the comparison operation on a key log-times often. See Johan's post here: http://blog.johantibell.com/2011/11/slides-from-my-guest-lecture-at.html.

Happy haskell hacking,
Simon

chowells79 · Answer 1 · Mon Dec 12 2011 02:54:18 GMT+0800 (China Standard Time)

This is a good suggestion, and would definitely simplify things. It likely won't result in a significant performance change, though it will slightly reduce pointer-chasing.
I can't see how to do this, as it's not a true doubly-linked list. It doesn't point at adjacent nodes, it points at keys that can be looked up to find adjacent nodes. I can't have a polymorphic guard key. And I can't see how to use a true doubly-linked list without it being a performance disaster (needing to copy every single node on change). This might be possible, but if it is, I'm having a failure of imagination.
& 4. Both are being addressed by the in-progress version 2.0 changes, in their own ways. Those changes do prevent the the clever optimization you suggest, but your suggestion is incomplete, as the hash isn't unique.. Sure, the hash could be cached in the LinkedVal, but the key would need to be stored too.

Simon Meier · Answer 2 · Mon Dec 12 2011 05:44:03 GMT+0800 (China Standard Time)

1. Couldn't you use the key in the first field of the LRU datatype as the guard key? For operations such as insert, you wouldn't even require a guard key, as you just insert the new node before (in the doubly linked list) the node pointed to by first. Moreover, one can detect the list of size one, as it has a LinkedValue where next is equal to prev.

3. The hash doesn't have to be unique. You just use the hashing together with the collision handling, as a way to get a "memory" address for the key you want to store. For your case, using double hashing (http://en.wikipedia.org/wiki/Double_hashing) would be a good collision handling scheme. It would allow you to always use Ints as pointers. Looking up a key in the hashtable then works according to the lookup for the hashtable. The linking is based on the actual position of the key in the hashtable. You are right that the key also has to be stored in the LinkedVal. However, all your lookups would be based on Ints only, which makes them (for some keys a lot) faster.