DQN agent for each agent in MARC.
- FAILED No replay buffer, update every timestamp, idea mostly from normal MARC solve methods.
- DIDNT TEST Use replay buffer, idea mostly from RL methods.
DDPG agent for each agent in MARC.
- FAILED No replay buffer, update every timestamp.
- SUCCESS With replay buffer. But still get big varience, gradient explosion.
Suitable for different faulty nodes. Stable.
Balenced.
Q Consensus only needs 1000 timestamps to get consensus. But DDPG needs much more time to finish training, and DDPG is very unstable. So Q Consensush is the winner.