plajjan / bgworker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bgworker - a Python background worker for Cisco NSO

bgworker is an NSO package that shows how we can implement a background worker process in Python for Cisco NSO.

While this comes as a complete NSO package which has an example background worker function that is executed, the idea is that you will take the background_process.py file and incorporate into your own NSO package when you need to launch a background worker process. See the example use in python/bgworker/main.py on how to use it.

It handles a number of things that aren’t otherwise entirely intuitive how to handle, such as:

  • NCS package events like reload and redeploy
  • background worker process dying (will restart it)
  • configuration changes - enabling/disabling of the background worker
  • HA events - only run background process when HA master or HA is disabled

NSO doesn’t have a natural way of shipping an NSO package whose Python files should just be available to other packages, which is why you simply have to copy the background_process.py file. It could potentially be placed on pip but it seems to specific (to NSO) to be worth putting there, at least for now. Let me know if you feel otherwise.

The example code in this package does not demonstrate all the abilities of the process supervisor. It does have a random condition to die every now in a while and the supervisor will then restart the process. This is evident from the log where you can see “Bad dice value” followed by the supervisor saying it is starting the process again. You can test the package redeploy by issuing `request packages package bgworker redeploy` and see how fast it is, it should be near instantaneous, showing that we correctly react to the python vm stop request from NCS. The reaction to configuration changes can be tested by using the ncs_cli and disabling the bgworker by going into `configure` mode and doing `set bgworker disabled` followed by `commit`. Finally, HA can be tested by loading whatever HAFW package you want, enabling HA in NCS and using the HAFW package functionality to switch to master mode or away from master mode, which should then lead to starting or stopping the background worker process respectively.

This code is written for Python 3. Python 2 is dead and you should stop using it. NSO 5.3 deprecates support for Python 2. It is weird that NSO doesn’t ship with using Python 3 per default, so you have to enable this yourself (it is documented in the user guide).

HA enabled

Reading the status of HA, whether HA is enabled or disabled, is only performed at startup. It is currently not possible to change enable or disable HA in NCS by only issuing a reload. If this changes in a newer version of NCS, then we would need to react to this too.

Recommended use / design pattern

A note on error handling and anti-patterns

It appears the following is a rather intuitive code pattern:

def bg_function():
    with ncs.maapi.Maapi() as m:
        with ncs.maapi.Session(m, 'something', 'system'):
            while True:
                try:
                    # do work here

                except:
                    # handle faults

You should not write code like that!

The problem is with the placement of the while-True-try-except loop in relation to the maapi context managers. An exception occurring in this loop will be caught and then retried, which is the sought after behaviour. It will however be retried within the same maapi context manager and if the exception was related to the maapi session, like a maapi connection failure, then it will be retried with the same maapi session which is still broken. The maapi session context manager does not automatically reconnect. MAAPI related exceptions are somewhat rare and won’t show up in most code but the above pattern exacerbates the problem as it continuously retries with the same broken maapi session.

This should be obvious but as 100% of the beta developers that tried using background_process implemented this exact anti-pattern, it was obvious that a note of warning was needed.

There are multiple solutions. The while-try loop can be moved to encapsulate the context managers like so:

def bg_function():
    while True:
        try:
            with ncs.maapi.Maapi() as m:
                with ncs.maapi.Session(m, 'something', 'system'):
                    # do work here

    except:
        # handle faults

It is relatively expensive setting up a Maapi session, so keeping them somewhat long lived is desirable. It naturally depends on the type of work you are carrying out of the overhead of opening new sessions is relevant or not. If each cycle takes minute, the overhead is likely negligible while if a cycle is on the order of a few seconds then you should probably consider a pattern where you try to persist the session over multiple work cycles.

A surprisingly simple way of dealing with this is to remove the exception handler:

def bg_function():
    with ncs.maapi.Maapi() as m:
        with ncs.maapi.Session(m, 'something', 'system'):
            while True:
                # do work here

The child process will die but as the background_process supervisor monitors the child it will be restarted in about a second.

The exception handler can be made more specific to not catch the maapi exceptions (instead letting the child die for that case and rely on the supervisor for restart). This is probably a better approach as soon as your worker function reaches some level of complexity in which case more complete exception handling is necessary.

Alternatively an outer while True loop is added, in which case we probably should break up the code into multiple functions since being 5-6 levels of nesting deep before you start writing your actual application code is pretty appalling.

BUGS

  • [ ] logging levels can’t seem to be reconfigured. Have to redeploy package to use new level.

To Do

  • [ ] describe the design
    • [ ] why multiprocessing?
    • [ ] why threads?
      • [ ] why so many?
    • [ ] why multiprocessing AND threads?
    • [ ] what’s up with the logger stuff?
  • [ ] write a more complete example showing how we can subscribe to config changes in worker process

About

License:MIT License


Languages

Language:Python 89.1%Language:Makefile 10.9%