hhblaze / Raft.Net

Implementation of RAFT distributed consensus algorithm among TCP Peers on .NET / .NETStandard / .NETCore / dotnet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Leader has memory leak in a 5 node setup

Lectere opened this issue · comments

Setup;

  • 5 nodes
  • a dataset of 10k records (around 100 bytes per record)
  • each node changes one item of the records around every 15 seconds

Config used;

""EntityName"":""default"", ""DelayedPersistenceMs"":500, ""DelayedPersistenceIsActive"":false, ""InMemoryEntity"":true, ""InMemoryEntityStartSyncFromLatestEntity"":true, ""VerboseRaft"":false, ""VerboseTransport"":false

After running around 10 hours, with each node making changes to the dataset, this error occours;

[ERROR] [System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at DBreeze.LianaTrie.LTrie.Add(Byte[]& key, Byte[]& value, Boolean& WasUpdated, Boolean dontUpdateIfExists) at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value, Byte[]& refToInsertedValue, Boolean& WasUpdated, Boolean dontUpdateIfExists) at DBreeze.Transactions.Transaction.Insert[TKey,TValue](String tableName, TKey key, TValue value) at Raft.StateLog.AddNextEntryToStateLogByLeader() at Raft.RaftNode.ApplyLogEntry() at Raft.RaftNode.AddLogEntry(Byte[] iData)] [Raft.RaftNode.AddLogEntryLeader] []

All nodes were online, and I'm using the 'InMemoryEntityStartSyncFromLatestEntity' option, so it should not save the entire log right?

In the task manager you can see that only the leader node is using up a lot of memory (3000mb+), the other nodes each take around 300-400mb.

With the addition, it should only use the last state, but, if I bring a node offline and then back online again, it appears to get the entire history of commits...

commented

Mentally, looks like the problem is in in-memory mode of DBreese. It works correctly, the problem is that transaction.RemoveKey only marks key as a deleted. But the data still occupies the space (disk or memory). From time to time must run the procedure of table purge. I will look on it when I have time, you can think about workarounds.

Thanks for the answer, but why are all commits saved, while I specified; "InMemoryEntityStartSyncFromLatestEntity"":true

commented

Also good question

commented

Or why the history is not removed (at least once per X records) when new entity state is stabilized on the node (e.g. before firing OnCommitted) .

I say, don't even add it to the log, I don't use the log. I ask a complete dataset from the leader on recoverd host.

I don't even use the lastest entity...

Something like this;

` if (rn.entitySettings.InMemoryEntityStartSyncFromLatestEntity == true) {
using (var t = this.rn.db.GetTransaction())
{
t.RemoveAllKeys(tblStateLogEntry, false);
t.Insert<byte[], byte[]>(tblStateLogEntry, new byte[] { 1 }.ToBytes(suggest.StateLogEntry.Index, suggest.StateLogEntry.Term), suggest.StateLogEntry.SerializeBiser());
t.Commit();
}

            }
            else { 
                using (var t = this.rn.db.GetTransaction())
                    {
                        t.Insert<byte[], byte[]>(tblStateLogEntry, new byte[] { 1 }.ToBytes(suggest.StateLogEntry.Index, suggest.StateLogEntry.Term), suggest.StateLogEntry.SerializeBiser());
                        t.Commit();
                    }
            }`

Maybe a little to drastic, but this seems to have worked;

 `        public StateLogEntrySuggestion AddNextEntryToStateLogByLeader()
    {
        var suggest = GetNextLogEntryToBeDistributed();
        if (suggest == null)
            return null;

        //Restoring current values
        PreviousStateLogId = suggest.StateLogEntry.PreviousStateLogId;
        PreviousStateLogTerm = suggest.StateLogEntry.PreviousStateLogTerm;
        StateLogId = suggest.StateLogEntry.Index;
        StateLogTerm = suggest.StateLogEntry.Term;

        if (rn.entitySettings.DelayedPersistenceIsActive)
        {
            sleCache[suggest.StateLogEntry.Index] = new Tuple<ulong, StateLogEntry>(suggest.StateLogEntry.Term, suggest.StateLogEntry);
        }
        else
        {
            ClearStateLogStartingFromCommitted();

                using (var t = this.rn.db.GetTransaction())
                {

                    t.Insert<byte[], byte[]>(tblStateLogEntry, new byte[] { 1 }.ToBytes(suggest.StateLogEntry.Index, suggest.StateLogEntry.Term), suggest.StateLogEntry.SerializeBiser());
                    t.Commit();
                }
        }

        return suggest;

    }`

I've added a ClearStateLogStartingFromCommitted on the function StateLogEntrySuggestion.

I think ClearStateLogStartingFromCommitted was only intended for followers, but the problem was on the leader.

Anyway, it runned for 10+ hours without a memoryleak now.

As per test, I've removed and re-added one node, and still the entire log was send to the newly added node...

commented

I would also think about correctness of the usage such entity without log. If 2 nodes in parallel, approximately in the same time, receive a command to add new different entities, how will act your idea?

For two different records/rows is not a problem, you both add them. Two changes on the same record is a problem, I save the newest (time based). Problem is the node who has been resetted, he will ask a all records from the leader.

I think I misunderstood your question, I've aswered it from a raft.net user's perspective. You're asking about the internal working of raft.net, so you're right, I don't know about that.

As it seems, my fix did not solve the problem. Still everything is kept, and memory consumption is slowly growing, even while I specified 'InMemoryEntityStartSyncFromLatestEntity'.

It be nice to be able to specify the amount of log item's to keep, say 100 or 500.

(hope you don't mis interpent my comment as all negative, I think your implementation of Raft is very good, having a closer look at it in the past weeks made me realise how clever everything is made...)

commented

I will look on all this when have time

commented

First step, I have published new sources where "InMemoryEntity":true is switched from DBreeze In-Memory mode to SortedDictionary that will be better for deleting keys.
Though, when "InMemoryEntityStartSyncFromLatestEntity":true, the running Raft log is not yet deleted, but newly connected node receives only last entity as you can see in screenshot

image

Second step will be the smart deleting from SortedDictionary of all other committed log entries than the newest one on all nodes.

commented

Ok, committed sources, you can run the stress test,
Currently
"InMemoryEntity":true,
"InMemoryEntityStartSyncFromLatestEntity":true
configuration (or just "InMemoryEntity":true ) should use the minimum of memory.

... Meanwhile, I would like to make some more reviews before the next release.

Cool, thnx, I will do some additional testing

commented

Please, get sources again, fixed one thing

Okey

commented

Raft v1.4 is published - please test.
Also added async implementation of AddLogEntry like await AddLogEntryAsync.