Arceliar / meshnet-lab

Emulate huge mobile ad-hoc mesh networks using Linux network namespaces.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mesh Network Lab

Emulate mobile ad-hoc mesh networks of hundreds of nodes on a computer. The network is realized using Linux network namespaces that are connected via virtual Ethernet interfaces. The network is defined in a JSON file.

Supported is the emulation of different network characteristics like bandwidth, packet loss, latency and others using traffic control. Node mobility is supported as well. The emulation can run distributed on multiple computers and is lightweight enough to support >200 of nodes on a single desktop computer alone.

This project is meant to test Mobile AdHoc Mesh routing protocols. Out of the box supported are Babel, B.A.T.M.A.N.-adv, OLSR1, OLSR2, OSPF, BMX6, BMX7, Yggdrasil and CJDNS. Check out the test results.

Small example:

{
  "links": [
    {
      "source": "a",
      "target": "b"
    },
    {
      "source": "b",
      "target": "c"
    }
  ]
}

JSON keys:

  • source, target: Mandatory. Name or number of the node. Maximum of 6 characters long. source and target are interchangeable and have no special distinction.
  • An explicit node list can be added (e.g. "nodes": [{"id": "a"}, {"id": "b"}] to define node specific variables for use in combination with the --node-command.
  • Other data fields are ignored.

Usage

First you need to have at least one routing protocol available. Batman-adv is already in the Linux kernel, so you only need to install the batctl package. There is a script to install all routing protocols.

Example run:

# Create a 10x10 grid and write it to a file called graph.json
./topology.py grid4 10 10 > graph.json

# Create network
./network.py apply graph.json
Network setup in 10.834s:
  nodes: 100 created, 0 removed, 0 updated
  links: 180 created, 0 removed, 0 updated

# Start software
./software.py start batman-adv
Started 100 batman-adv instances in 3.16s

# Run some test commands (output omitted)
./tools.py ping
./tools.py traffic --duration 3
./software.py --verbosity verbose run 'ip a && echo "Hello from inside node"'

# Stop software
./software.py stop batman-adv
Stopped 100 batman-adv instances in 3.109s

# Remove network
./network.py apply none

As an alternative, you can stop all protocols using ./software.py clear and remove all namespaces using ./network.py clear. This is useful to cleanup after a tests has been interrupted.

The batman-adv protocol refers to the start/stop scripts in the protocols subfolder. Add your own scripts to support other protocols.

A collections of automated tests with data plot generation is available in the tests subfolder.

Software Components

  • network.py creates a network topology from a description in JSON.
  • software.py starts routing protocol software in all namespaces.
  • topology.py creates JSON files with descriptions of common topologies (grids, lines, loop, trees).
  • tools.py contains tools to create ping statistics and to measure traffic.

The code is written for Python 3 and uses the ip, ping and pkill commands. You need Linux Kernel >=4.18 to run meshnet-lab.

Add Traffic Control

The command provided via the --link-command parameter of the network.py script will be executed twice. Once for every device end of a link (in the switch namespace). It is meant to be used to configure the kernel packet scheduler.

Given some link:

{
"links": [
    {"source": 0, "target": 1, "rate": "100mbit", "source_latency": 2, "target_latency": 10}
  ]
}

The command can now make use of the following variables:

./network.py \
  --link-command 'tc qdisc replace dev "{ifname}" root tbf rate {rate} burst 8192 latency {latency}ms' \
  apply graph.json

Notes:

  • the command is called for each end of a link
  • source_ and target_ prefixes are omitted
  • ifname is always provided

Distributed Execution

Emulating a lot of nodes can bring a single computer to its limits. Use network.py --remotes <json-file> ... to distribute the mesh network on several remotes. The SSH login as root to these computers must be passwordless.

Example remotes.json:

[
    {"address": "192.168.44.133"},
    {"address": "192.168.44.135"}
]

(Note: You can also specifiy a SSH "identity_file")

A typical distributed workflow would be:

# create network
./network.py --remotes remotes.json apply graph.json
# start software
./software.py --remotes remotes.json start batman-adv
# run tests
./tools.py --remotes remotes.json ping

SSH Connection Sharing

Distributed emulation uses SSH to execute commands on remote hosts. To speed up SSH connections a lot, add this to your ~/.ssh/config:

Host *
    ControlMaster auto
    ControlPath ~/.ssh/sockets/%r@%h-%p
    ControlPersist 600

(Note: make sure directory ~/.ssh/sockets/ exists)

Limitations

  • no support for multiple connections between two nodes (multigraphs)
  • only one mesh interface per node/namespace
  • no discrete event simulation that can run faster than real time
  • computer performance might influence results

Internal Working

Every node is represented by its own network namespace (ns-*) and a namespace called switch that contains all the cabling. The node namespace and bridge in switch are connected by a veth peer pair uplink and dl-<node>.

All interfaces in the bridges (except the dl-<node>) are set to isolated. This makes data flow only to and from the non-isolated dl-<node> interface, but not between them.

All bridges have ageing_time and forward_delay set to 0 to make them behave link a hub. A packet from the uplink will be send to all connections, but not between them.

Visual Example

  • Applications can be started in namespaces ns-a, ns-b, ns-c etc. and see only their interface called uplink
  • bridges have properties stp_state, ageing_time and forward_delay set to 0
  • ve-* interfaces have property isolated set to on
  • only one simulation can be run at the same time

Routing Protocol Notes

  • BATMAN-adv:
    • needs batctl installed
    • the current metric limits the maximum hop count to 32 (source)
    • kworker/u32:1+bat_events quickly becomes a single threaded bottleneck
      • change create_singlethread_workqueue() to create_workqueue() in net/batman-adv/main.c (source)
      • this seems to have a very little effect
    • OGM paket TTL is 50 (source)
    • tested with batman-adv 2019.4
  • OLSR2 complains when the Linux kernel is not compiled with CONFIG_IPV6_MULTIPLE_TABLES enabled
    • all routes will land in the main table which can interfere with Internet access
      • this is of no concern for the test setup
    • tested with olsr2 0.15.1
  • OLSR1 has buggy/broken IPv6 support, we use IPv4 instead
    • tested with olsr1 0.9.8
  • Babel has a maximum metric of 2^16 - 1, a single wired hop has a default metric of 96, a wireless hop with no packet loss has a metric of 256. That allows a maximum hop count of around 683 hops. (source)
    • use default rxcost 16 in the configuration file to configure the metric
  • Yggdrasil needs the most resources (CPU/RAM) of the routing protocol programs supported here
    • encrypts traffic
  • CJDNS security can be disabled. Compile for speed using NSA_APPROVED=true Seccomp_NO=1 NO_TEST=1 NO_NEON=1 CFLAGS="-O0" ./do.
  • [Errno 24] Too many open files: With big networks, tests can spwan thousands of pings and wait for them. This can cause this error message. Use ulimit -Sn 4096 to increase the file desciptor limit.

Related Projects

About

Emulate huge mobile ad-hoc mesh networks using Linux network namespaces.

License:MIT License


Languages

Language:Python 73.8%Language:Shell 26.2%