Updating relations gets gradually slower
SignIn-SignUp opened this issue · comments
I could reproduce this in Haskell Code as well as in tutd.
When updating a relvar like the the one created by the following commands:
products :: {name Text, age Integer}
products := relation{tuple{name "Mike",age 6},tuple{name "Sam",age 10}}
key products_key_constraint {name} products
And then updating it with that command alternating the value we want to set age to:
update products where name="Sam" (age := 11)
The updates get gradually slower after a few updates. In my usecase the same result without the performance decrease can be achieved (at least when using it in code) by deleting the tuple and then inserting the "updated" tuple.
When profiling the majority of time is spent in ProjectM36.Relation.mkRelation.duplicateAttrNames
and ProjectM36.Attribute.attributesDifference
. But I am guessing that this is normal and maybe unrelated although in some extreme cases the percentage was about 40%. The slowdown feels almost exponential or at least of a high polynomial complexity.
Is this intentional or am I just using something wrong ?
Does this issue arise for others as well ?
Indeed! You have discovered a major architectural difference of Project:M36 when compared to other DBMS products. This slowdown is a consequence of Project:M36's novel architecture which maximizes laziness and data independence. You can read more about the laziness in this documentation.
The architecture calls for a caching system based on memoization to mitigate the laziness, but it is not yet implemented. In the meantime, the mental model for this slowdown is that the idempotent database updates "stack" on each other, i.e. all previous updates necessary to generate the answer to the query are re-executed.
Ok, thank you! Thats eye opening.
Is this comparable to the logical logging in some other systems wehere the operation is saved but not the corresponding values?
Is deleting and inserting a valid alternative at the moment if performance is somewhat needed?
Yes, that's a similar concept except that Project:M36 logs the actual updates and only materializes the necessary tuples (often a subset of tuples needed only to service the query-at-hand) on demand. This allows Project:M36 to advertise commits in constant time O(1) instead of O(n) where n is the number of tuples altered.
As a consequence of this choice, however, all updates must be idempotent (repeatable) and this is strongly enforced which should make replication also much easier to reason about.
Thank you! I will close the issue then.
No problem. The current non-cached version is not ideal, so we hope to improve this in an upcoming release shortly. Feel free to join in IRC at chat.freenode.net #project-m36 as well