Behavior on overload
Infrasonics opened this issue · comments
I came across a situation where YAFS indicates a proper run, yet reality says otherwise.
This is based on the EEG-Tractor Beam example and was only modified to fit to the available testbed. The links are given as measured on the testbed. The model appears not to handle concurrent messages correctly when the arrival interval is shorter than the processing time.
The output (of the example below) shows that the service time is constant at 1.5 seconds, yet in reality the service time becomes infinitely long since the server gets overloaded with requests. The overload is already obvious at closer inspection as every message takes three billion instructions to process yet the server/cloud only computes two billion instructions per second. Therefore 1.5 might be correct for the first message yet the client sends a new message every second. Therefore this leads to a DDoS scenario with a single client already; the results do not indicate that behavior in any way.
A reimplementation of the shown scenario showed exactly that behavior.
I used the following scenario file -- which is intentionally not an MWE to avoid incorrect use of the YAFS framework:
import random
from yafs.core import Sim
from yafs.application import Application, Message
from yafs.population import *
from yafs.topology import Topology
from simpleSelection import MinimunPath
from simplePlacement import CloudPlacement
from yafs.stats import Stats
from yafs.distribution import deterministicDistribution
import time
import numpy as np
from sys import argv
from os import path
RANDOM_SEED = 1
RESULT_PATH = "Results_" + path.splitext(path.basename(argv[0]))[0]
def create_application():
a = Application(name="SimpleCase")
a.set_modules(
[
{"Sensor": {"Type": Application.TYPE_SOURCE}},
{"ServiceA": {"RAM": 4000000000, "Type": Application.TYPE_MODULE}},
]
)
m_client = Message(
"M.client", "Sensor", "ServiceA", instructions=3 * 10**9, bytes=1000
)
a.add_source_messages(m_client)
a.add_service_module("ServiceA", m_client)
return a
def create_json_topology():
topology_json = {}
topology_json["entity"] = []
topology_json["link"] = []
cloud_dev = {
"id": 0,
"model": "cloud",
"mytag": "cloud",
"IPT": 2.0 * 10**9,
"RAM": 4 * 10**9,
"COST": 3,
"WATT": 20.0,
}
sensor_dev = {
"id": 1,
"model": "sensor-device",
"IPT": 2.0 * 10**9,
"RAM": 4 * 10**9,
"COST": 3,
"WATT": 20.0,
}
link1 = {"s": 0, "d": 1, "BW": 100 * 10**6, "PR": 0.000155}
topology_json["entity"].append(cloud_dev)
topology_json["entity"].append(sensor_dev)
topology_json["link"].append(link1)
return topology_json
# @profile
def main(simulated_time):
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
t = Topology()
t_json = create_json_topology()
t.load(t_json)
t.write("network.gexf")
app = create_application()
placement = CloudPlacement(
"onCloud"
) # it defines the deployed rules: module-device
placement.scaleService({"ServiceA": 1})
pop = Statical("Statical")
pop.set_sink_control(
{
"model": "actuator-device",
"number": 1,
"module": app.get_sink_modules(),
}
)
dDistribution = deterministicDistribution(name="Deterministic", time=1)
pop.set_src_control(
{
"model": "sensor-device",
"number": 1,
"message": app.get_message("M.client"),
"distribution": dDistribution,
}
)
selectorPath = MinimunPath()
""" SIMULATION ENGINE """
stop_time = simulated_time
s = Sim(t, default_results_path=RESULT_PATH)
s.deploy_app(app, placement, pop, selectorPath)
s.run(stop_time, show_progress_monitor=False)
if __name__ == "__main__":
import logging.config
import os
logging.config.fileConfig(os.getcwd() + "/logging.ini")
start_time = time.time()
main(simulated_time=60)
print("\n--- %s seconds ---" % (time.time() - start_time))
m = Stats(defaultPath=RESULT_PATH) # Same name of the results
time_loops = [["M.client"]]
m.showResults2(1000, time_loops=time_loops)
Thank you for the comment.
I thought the behaviour on the results is right. The implementation models an M/M/1 system, where the length of the service queue grows due to the number of arrivals being bigger than the service time and in this way, it also grows the response time. The service time remains constant and the network link is not overstressed either.
We can observe it in the results_.csv file (I have removed some columns for readability issues):
id | TOPO.src | TOPO.dst | service | time_in | time_out | time_emit | time_reception |
---|---|---|---|---|---|---|---|
1 | 1 | 0 | 1.5 | 1.00015500001 | 2.50015500001 | 1.0 | 1.00015500001 |
2 | 1 | 0 | 1.5 | 2.50015500001 | 4.00015500001 | 2.0 | 2.00015500001 |
3 | 1 | 0 | 1.5 | 4.00015500001 | 5.50015500001 | 3.0 | 3.00015500001 |
4 | 1 | 0 | 1.5 | 5.50015500001 | 7.00015500001 | 4.0 | 4.00015500001 |
5 | 1 | 0 | 1.5 | 7.00015500001 | 8.50015500001 | 5.0 | 5.00015500001 |
6 | 1 | 0 | 1.5 | 8.50015500001 | 10.00015500001 | 6.0 | 6.00015500001 |
7 | 1 | 0 | 1.5 | 10.00015500001 | 11.50015500001 | 7.0 | 7.00015500001 |
8 | 1 | 0 | 1.5 | 11.50015500001 | 13.00015500001 | 8.0 | 8.00015500001 |
9 | 1 | 0 | 1.5 | 13.00015500001 | 14.50015500001 | 9.0 | 9.00015500001 |
10 | 1 | 0 | 1.5 | 14.50015500001 | 16.00015500001 | 10.0 | 10.00015500001 |
11 | 1 | 0 | 1.5 | 16.00015500001 | 17.50015500001 | 11.0 | 11.00015500001 |
12 | 1 | 0 | 1.5 | 17.50015500001 | 19.00015500001 | 12.0 | 12.00015500001 |
13 | 1 | 0 | 1.5 | 19.00015500001 | 20.50015500001 | 13.0 | 13.00015500001 |
14 | 1 | 0 | 1.5 | 20.50015500001 | 22.00015500001 | 14.0 | 14.00015500001 |
15 | 1 | 0 | 1.5 | 22.00015500001 | 23.50015500001 | 15.0 | 15.00015500001 |
16 | 1 | 0 | 1.5 | 23.50015500001 | 25.00015500001 | 16.0 | 16.00015500001 |
17 | 1 | 0 | 1.5 | 25.00015500001 | 26.50015500001 | 17.0 | 17.00015500001 |
18 | 1 | 0 | 1.5 | 26.50015500001 | 28.00015500001 | 18.0 | 18.00015500001 |
19 | 1 | 0 | 1.5 | 28.00015500001 | 29.50015500001 | 19.0 | 19.00015500001 |
20 | 1 | 0 | 1.5 | 29.50015500001 | 31.00015500001 | 20.0 | 20.00015500001 |
21 | 1 | 0 | 1.5 | 31.00015500001 | 32.50015500001 | 21.0 | 21.00015500001 |
22 | 1 | 0 | 1.5 | 32.50015500001 | 34.00015500001 | 22.0 | 22.00015500001 |
23 | 1 | 0 | 1.5 | 34.00015500001 | 35.50015500001 | 23.0 | 23.00015500001 |
24 | 1 | 0 | 1.5 | 35.50015500001 | 37.00015500001 | 24.0 | 24.00015500001 |
25 | 1 | 0 | 1.5 | 37.00015500001 | 38.50015500001 | 25.0 | 25.00015500001 |
... | |||||||
40 | 1 | 0 | 1.5 | 59.50015500001 | 61.00015500001 | 40.0 | 40.00015500001 |
For example,
-
at 25th row/request:
the service time is = 38.50015500001 - 37.00015500001 = 1.5
the waiting time is = 37.00015500001 - 25.00015500001 = 11.99999
the response time is = 38.50015500001- 25.0 = 13.5001550 -
at 40th row:
the service time is = 61.00015500001 - 59.50015500001 = 1.5
the waiting time is = 59.50015500001 - 40.00015500001 = 19.5
the response time is = 61.00015500001- 40.0 = 21.00015500
I hope I have clarified the interpretation of the results for each request recorded, and for the proposed model (an M/M/1).
In any case, I leave the thread open in case I misunderstood your question.
Best
- Note: I've considered the network latency in the response time, but more or less the results are coherent with the idea.