Multiple application topology is crashing if a message doesn't have a valid path do to a node removal
HenriqueMSilva opened this issue · comments
Hi,
I need to simulate multiple users in my topology.
For that, I have created a network with multiple applications, and Population(object) instances associated.
Also, I am using a deterministic distribution (0,100) in my Population instances activation_dist
input param.
To simulate failures I am removing nodes, but if I remove node 1 in the example the simulation crashes:
raise nx.NetworkXNoPath("No path between %s and %s." % (source, target)) NetworkXNoPath: No path between 13 and 2
Meanwhile, if I have the same topology, but only one application,
node 1 is removed at env.now == 200 and simulation_time = 300, I can run to completion without problems; the message is simply lost:
2019-09-18 18:15:54,659 - yafs.core - WARNING - The initial path assigned is unreachabled. Link: (13,1). Routing a new one. 200 2019-09-18 18:15:54,659 - yafs.core - DEBUG - No path given. Message is lost 2019-09-18 18:15:54,660 - yafs.core - WARNING - The initial path assigned is unreachabled. Link: (16,1). Routing a new one. 200 2019-09-18 18:15:54,660 - yafs.core - DEBUG - No path given. Message is lost
In this case, if the removing and stopping of the simulation were any longer, it would crash anyways.
I was able to run to completion the first example if I altered the yafs/core.py
source.
line 232
WAS:
except KeyError:
NOW:
except:
I couldn't figure out the exact problem but hope this explanation helps.
Hello Henrique,
It's a nice infrastructure! Let's try to fix it.
First error
raise nx.NetworkXNoPath("No path between %s and %s." % (source, target)) NetworkXNoPath: No path between 13 and 2
This error is triggered by the Nx library that cannot found both nodes. Without seeing the code, there may be two options.
A) a different type-definition of id-nodes in the topology and in other policies.
Check if id-nodes in Nx (.G.nodes) are strings or integers, and the same in the get_path function (in "selection" script) . Both references should be the same.
B) t.G is not a bidirectional graph, but I guess this issue is less improbable.
Second error
2019-09-18 18:15:54,659 - yafs.core - WARNING - The initial path assigned is unreachabled. Link: (13,1). Routing a new one. 200 2019-09-18 18:15:54,659 - yafs.core - DEBUG - No path given. Message is lost 2019-09-18 18:15:54,660 - yafs.core - WARNING - The initial path assigned is unreachabled. Link: (16,1). Routing a new one. 200 2019-09-18 18:15:54,660 - yafs.core - DEBUG - No path given. Message is lost
When there is a failure in the topology, some messages need to change their previously computed path. In this case, the function get_path_from_failure is called and internally this function calls to get_path function. So, the first error is triggered again, but the catch text is different.
Both functions are defined in your project ("selection_..py", i.e. YAFS/src/examples/DynamicFailuresOnNodes/selection_multipleDeploys.py )
Best,
Isaac