Flasew / ffsim-opera

mixing FlexFlow simulator with htsim-based opera simulator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FlexNetPacket simulator

This module contains the FlexNetPacket packet-level network simulator. It models the cluster that runs DNN training job based on TopoOpt, Fat-Tree, SiP-ML, Expander and abstract switch network topologies. The source code was extended from the Opera simulator from NSDI 2020, please check the original README file here.

Compilation:

To build the FlexNetPacket simulator, from the top level directory run:

export FF_HOME=<directory of FlexNet>
cd src/clos
make
cd datacenter
make

Executables

The executables are found in the src/clos/datacenter folder. They have the name "htsim_...". The following table provides details on each executable:

Executable Network Topology
htsim_tcp_fattree Fat-Tree network topology, single job
htsim_tcp_flat Flat Network topology (switchless) for single job. Use this executable for TopoOpt and expander
htsim_tcp_fc Fully connected topology for single job
htsim_tcp_os_fattree Oversubscribed Fat-Tree where the ToR switches are oversubscribed
htsim_tcp_aggos_ft Oversubscribed Fat-Tree where the aggregation layer is oversubscribed
htsim_tcp_dyn_flat Dynamic network used to simulate SiP-ML
htsim_tcp_fattree_multijob Fat-Tree network topology used to simulate multiple DNN jobs concurrently
htsim_tcp_aggos_fattree_multijob Oversubscribed (aggregation switches) Fat-Tree network topology used to simulate multiple DNN jobs concurrently

Brief description on source code

FlexNetPacket's major extention from the htsim simulator allows it to take a taskgraph (in FlatBuffer) generated from FlexNet simulator. To achieve this, src/clos/ffapp.* was implemented as an API to read and process these such taskgraphs. In addition, a few network topologies are added, notably the dynamic network executable that simulates SiP-ML. The network scheduler code can be found at src/clos/dyn_net_sch.*.

Each topology's "main" function can be found in src/clos/datacenter/main_tcp_*.cpp, which provides detailed description on the input arguments for the executable.

About

mixing FlexFlow simulator with htsim-based opera simulator

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:C++ 99.5%Language:Makefile 0.4%Language:Shell 0.0%Language:C 0.0%