etingof / snmpsim

SNMP Simulator

Home Page:http://snmplabs.com/snmpsim/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible to Signal Daemon to Reconnect to DB?

qrobinson opened this issue · comments

Thanks for this software, it is immensely useful in my work.
We are especially pleased with the alpha release of the control-panel and API. Many of the new features since 0.4.7 are still unused in our harness, like bzip support.

We simulate as many as 90k field devices (each with distinct MAC) using ~60 snmpsimd processes distributed over 5 servers each listening to a different port. Each of these processes can use up to 100% of a CPU core under load, particularly at start-up. Caching these snmprecs usually takes several hours. If ever the daemons are idle for extended time, they may drop their MySQL connections. Is there any way to signal the daemon to re-establish the dropped connection?
Alternatively, as the time cost is substantial at startup, is there a way to bypass creating cache-files when they are already present and is that a reasonable tactic?

[edit]
Our biggest issue at present is load-balancing and throughput, as there's a need to complete units of work within a finite interval.
Partitioning the complete set of snmprecs by server is one way of limiting startup delay such that each process only caches the instances for which its server is responsible for serving responses.
We've also considered storing all the snmprecs and cache directories on an NFS drive to eliminate redundancy and increase throughput.
I'd appreciate any useful input in this regard.

Best regards

Caching these snmprecs usually takes several hours.

This is surprising! The way how snmpsimd should work is this. You give it a set of .snmprec files in --data-dir. For each .snmprec file snmpsimd ensures the existence of .dbm file somewhere in a temporary directory (to speed up by-OID look up at runtime).

The daemon will re-create these .dbm indices on startup, for all .snmprec files in its --data-dir, only if .dbm files are not present or if .snmprec files are modified later than corresponding .dbm files. Otherwise no files will be opened on startup.

Note what the daemon is reporting in its log file when it starting - there might be some explanation in the log if it decided to rebuild the index.

Caching these snmprecs usually takes several hours.

This is something I'd not expect.

If ever the daemons are idle for extended time, they may drop their MySQL connections

I assume you are using the sql plugin?

Is there any way to signal the daemon to re-establish the dropped connection?

Presently, sqlplugin is very simplistic. So it can be made more resilient to dropped connections indeed.

Alternatively, as the time cost is substantial at startup, is there a way to bypass creating cache-files when they are already present and is that a reasonable tactic?

This is already implemented and it works for me. We need to figure out why it does not work for you.

Our biggest issue at present is load-balancing and throughput, as there's a need to complete units of work within a finite interval.

Is it the matter of slow startup upon broken MySQL connection or is it about SNMP throughput?

Partitioning the complete set of snmprecs by server is one way of limiting startup delay such that each process only caches the instances for which its server is responsible for serving responses.

If your .snmprec files and their once created indices are not changing, startup should be quick enough. If listing the directory of .snmprec files takes so much time, we can even add a flag to snmpsimd to skip index consistency check.

So far I think we need to understand the root cause of this bottleneck.

Thoughts?

We're grateful to cooperate with you towards improving our use of the software.
I'll soon provide the init script for review. We should also be able to provide logs.

The snmpsimd daemons use sql plugin against MySQL 8.0 instance (most remote, some local) each using local --data-dir.
If sql connection drops after daemon initialization, the daemons stop processing UoW, but remain in the processlist. If the sql connection fails at startup, no UoW are completed as in the prior case.

Please inform us to the artifacts that will be most-helpful to troiubleshoot.

Regards,

quincy

here's the init-script we use to initialize the simulator backend to our application

#!/bin/bash

##this script serves as an init-script for snmpsimd and
##can use filesystem snmprec files or mysql database for backend
##version 2.0
##author:  QA
##installation date: Sat Apr 27 2019

#v and w allow for instantiating daemons across v*w ports
#sim225 advertises 32 CPU cores; under load, snmpsimd can push a core to 100%
#so be reasonable about the # of daemons to spawn.

d=$(date --rfc-3339=seconds)
V="1 2 3 4 5 6" # 7 8 9"
W="1 2 " #3"
declare -i c=1

echo; echo "Initializing simulators at "$(hostname)" on $(date --rfc-3339=seconds)"
echo "***********************************************"

for w in $W; do  \
  for v in $V; do \
    if [[ ! -d /tmp/snmpsimd/cache-${v}16${w} ]]; then
      echo "creating cache directories for simulator at "$(hostname)
      mkdir -m 0744 -p /tmp/snmpsimd/cache-${v}16${w}
    fi
    chown -R snmpsimd:snmpsimd /tmp/snmpsimd/cache-????
    echo "Spawning snmpsimd #${c} listening on udp/${v}16${w} at "$(hostname)
    /usr/bin/python /usr/bin/snmpsimd.py \
      --daemonize \
      --v2c-arch \
      --process-user=snmpsimd \
      --process-group=snmpsimd \
      --transport-id-offset=${c} \
      --agent-udpv4-endpoint=0.0.0.0:${v}16${w} \
      --variation-module-options=sql:dbtype:mysql.connector,host:sim225,port:3306,user:snmpsimd,password:snmpsimd,database:snmpsim \
      --pid-file=/tmp/snmpsimd-${v}16${w}.pid \
      --logging-method=file:/tmp/snmpsimd.log:1g \
      --log-level=error \
      --cache-dir=/tmp/snmpsimd/cache-${v}16${w}
    ((c++))
  done
done && echo "Completed instantiating simulator daemons at $(hostname) on $(date --rfc-3339=seconds)"

echo; echo "***********************************************"