Opinions

RL based

DQN

DQN agent for each agent in MARC.

FAILED No replay buffer, update every timestamp, idea mostly from normal MARC solve methods.
DIDNT TEST Use replay buffer, idea mostly from RL methods.

DDPG

DDPG agent for each agent in MARC.

FAILED No replay buffer, update every timestamp.
SUCCESS With replay buffer. But still get big varience, gradient explosion.

Q Consensus

Suitable for different faulty nodes. Stable.

Battle

DDPG for good nodes vs. DDPG for bad nodes

Balenced.

Q Consensus vs DDPG

Q Consensus only needs 1000 timestamps to get consensus. But DDPG needs much more time to finish training, and DDPG is very unstable. So Q Consensush is the winner.

nicehiro / resilient-consensus