discrete-event-simulation queuing-simulator serverless system-simulation

NoServer 凵

NoServer is a full-system, queuing simulator for serverless workflows.

Key Features

Supports serverless workflows, scalable to thousands function instances and hundreds of nodes.
Simulates all layers of abstraction of a typical serverless system.
Disaggregates the three scheduling dimensions: load balancing (LB), autoscaling (AS), and function placement (FP).
- Separate serverless police logic from the underlying system implementation.
Models hyterogeneous worker types (e.g., Harvest VMs).

Overview

The base models of each abstraction layer are the following:

Serverless platform: Knative
Cluster orchestration: Kubernetes
Container runtime: containerd
OS kernel: Linux

Setup

$ pip3 install -r requirements.txt

Usage

$ python3 -m noserver [flags]

Flags:
  --mode: Simulation mode to run. Available options: [test, rps, dag, benchmark, trace].
  --trace: Path to the DAG trace to simulate. Default: 'data/trace_dags.pkl'.
  --hvm: Specify a fixed Harvest VM from the trace to simulate.
  --logfile: Log file path.
  --display: Display the task DAG. (Opposite option: --nodisplay)
  --vm: Number of normal VMs. Default: 2.
  --cores: Number of cores per VM. Default: 40.
  --stages: Number of stages in the task DAG. Default: 8.
  --invocations: Total number of invocations in the task DAG. Default: 4096.
  --width: Width of the DAG. Default: 1.
  --depth: Depth of the DAG. Default: 1.
  --rps: Request per second arrival rate. Default: 1.0.
  --config: Path to a configuration file. Default: './configs/default.py'.

Note:
  • The '--mode' flag is required. You must provide a valid simulation mode.
  • Use '--display' to show the task DAG graph. Use '--nodisplay' to suppress the display.
  • To use other configurations: 
    python3 -m noserver --config=./configs/another_config.py:params
  • To override parameters:
    python3 -m noserver --mode dag --noconfig.harvestvm.ENABLE_HARVEST

Validation

I conducted validation against the serverless platform vHive (a benchmark wrapper around Knative).

For the following experiments, the cluster specifications are the following:

Machine type: c220g5 on Cloudlab Winsconsin
- Number of cores per node: 40 (2 sockets w/ 10 cores each, hyperthreads: 2)
- Maximum theoretical throughput is around 40 requests per second.
Cluster size: 11 nodes
- 1 master + 10 workers
Function execution time: 1 s
- 50 percentile from Azure function trace
Function memory footprint: 170 MiB
- 50 percentile from Azure function trace

Resource Utilization

Real-Time Autoscaling

Latency & Cold Start

In the following experiments, the cluster was not warmed up in order to preserve cold start.

50 percentile (p50):

99 percentile (p99):

About

凵 Full-system, queuing simulator for serverless workflows.

discrete-event-simulation queuing-simulator serverless system-simulation

Apache License 2.0

Languages

Language:Python 92.4%Language:Shell 6.1%Language:Makefile 1.5%