nokia / moler

Moler – library to help build automated tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nested state machines

greg-latuszek opened this issue · comments

Why we need it?

If we face interactive command like openssl:
https://www.openssl.org/docs/manmaster/man1/openssl.html
https://wiki.openssl.org/index.php/Command_Line_Utilities (search "interactive mode")
then we may end up with something that may be started in multiple states of typical Moler device. openssl may run inside UNIX_LOCAL, PROXY_PC, UNIX_REMOTE.
Moreover, running interactive command introduces its own prompt (OpenSSL> for example). It finishes running openssl command but after it you can't run normal linux commands, just those of openssl. So, it constitutes state of device State Machine. We might introduce new state into existing devices but:

  1. that would require complicating state machine
  2. require to do same change in multiple machines since such cmd might run from different states
  3. adding next interactive command would require the same effort of even more

Concluding: such solution would generate exponential effort, unreadable code and maintenance nightmare.

Comments following below are results of days lasting discussions and analysis. It will discuss:

  1. possible solutions (benefits and drawbacks)
  2. selected solution
  3. analysis of moler current code and changes required to implement that solution

Possible solutions

Generic assumption in searching for solution: we don't want to overcomplicate State Machine (we will use SM abbreviation) of existing devices.

Analized solutions:

  1. Interactive command injecting it's states into SM
  2. Interactive command being SM but not being device
  3. Interactive command firing new moler device (which already is SM)

1. Interactive command injecting it's states into SM

Since command knows it's states it might inject them into SM of hosting device.

benefits

  1. device code not modified

drawbacks

  1. breaking concepts separation and whole-part order: cmd would need to know device (dev has cmd, cmd has dev)
  2. problem of cleaning up: shell command remove injected states? If so, how and when?
  3. cmd would need to know SM of all hosting devices (like UNIX_LOCAL, PROXY_PC, UNIX_REMOTE) to know how to inject own states. Complicated code of command

2. Interactive command being SM but not being device

Moler devices use https://github.com/pytransitions/transitions as machinery to implement their SMs. Implementation is a bit tricky since we want:

  1. "jumping over states towards requested target state" - dev.goto_state()
  2. securing against inconsistent python vs real state - so, we use prompt observers to detect where SM is after dropping connection or a like

goto_state() is not the must for interactive commands. Currently envisioned SM for interactive command is simple, just 3 states: NOT_STARTED, INTERACTIVE, END. Of cause, in future, there might appear interactive commands with more states. However, it looks like they might be implemented in pure pytransition API. That would constitute new type of SM, different from SM of devices. Such SM would work as sub-SM inside current state of device SM. Device would treat it as running command. Interactive command would be still conceptually a command, however, more complicated one.

benefits

  1. device code not modified

drawbacks

  1. to allow for splitting interactive command with huge functionalities (like openssl) into smaller modules we would need to mimic device SM implementation: command would be directory under moler/cmd/unix/ and subcommands would reside there. Means big coding.
  2. having multiple ways of handling SM in single library would increase learning curve for newcomers and increase maintenance cost

3. Interactive command firing new moler device (which already is SM)

Firing interactive command would create new device being SM (in same form as current SMs of moler). Hosting device would go into "transparent mode". So, conceptually hosting device would tread interactive command as single state being nested SM with its own states. That would not require to modify states of hosting device but would require to refactor its code. Since interactive command may happen in multiple states device must be able to programmatically jump into "virtual state" of running interactive command.

Reusing current SM for interactive cmd (on example of openssl):

  1. openssl is just command under moler/cmd/unix/
  2. when it starts it fires device from moler/device/openssl.py
    1. such device reuses connection and newlines-definition from hosting device
  3. openssl device has its own commands inside moler/cmd/openssl/

benefits

  1. preserving single implementation of SM in moler (low learning curve & maintenance cost)
  2. big reuse of existing codebase
  3. simplicity of coding new sub-commands of interactive command (they are just normal moler commands)

drawbacks

  1. need to modify existing machinery of moler SM for devices
  2. better testing required since we are changing device code already in use
  3. need some tricks/wrappers/callbacks to preserve whole-part order (cmd should not know device)
  4. having possibility to use interactive command in non interactive mode would require better design to not end-up with duplicated code

Decision

Selecting solution 3 as most beneficial even if it requires refactoring code of moler devices SM.

Required changes in moler devices SM

Majority of changes will happen inside moler/device/textualdevice.py

self.current_state

It is property of TextualDevice. If we jump into sub-SM it should return state of nested device prefixed with device class name.

Like: OpenSsl.INTERACTIVE. So for hosting device it is like "I'm in state OpenSsl" with sub state 'INTERACTIVE'.

We don't want to use _ as substate separator since it would be ambiguous where is state and where is substate (moler already uses _ in state names like UNIX_REMOTE). See same discussion at https://github.com/pytransitions/transitions#hsm

This returned value doesn't come from prompts observers detection, nor from states listed on hosting device SM configuration. It is just returned when device detects "I'm working in transparent mode".

So, we will need new member: self._transparent_SM_mode

state names

Nested SM will see something like NOT_STARTED --> INTERACTIVE --> END
Hosting SM will see OpenSsl.INTERACTIVE

Prompt observers

They will work as they are. We will not add nested SM prompts (like OpenSsl>) into prompt observers of hosting SM. Old responsibility remains - host SM only knows own prompts. So, for example it can detect dropped connection to UNIX_REMOTE.

Nested SM will have prompt observers only for own prompts (like OpenSsl). So, it has no means to move it into END state. It is not its responsibility. It will be done by hosting SM since it knows prompt/state which nested SM started from.

So, we need to change prompt observers callback _prompts_observer_callback. Besides setting state it should:

  1. detect if we are in transparent mode
  2. If so, trigger closeup of nested device:
    1. finishing any running command of nested device
    2. changing state of nested device to END
    3. (maybe) remove nested device from memory, remove reference to nested device from hosting device

Need change in def _validate_prompts_uniqueness()

We need to know prompts available in sub-SM and compare it against prompts of hosting SM. Otherwise we may have 2 prompts observers keeping eye on same prompt. That would lead to risky code with races.

That may become limiting factor nesting level of devices (nested SM being host SM for next-level nested SM)

def get_prompt()

It is used by get_cmd() to fill cmd_params["prompt"] of cmd to create. Called only if prompt for new command is not directly provided. That prompt is used by cmd to detect when it is done.

If we are in transparent mode and requesting device to create some subcommand of interactive command then:

  1. either proxy from hosting SM get_prompt() to nested SM get_prompt()
  2. or don't call get_prompt() from inside of get_cmd()

However, since get_prompt() is public API it's better to utilize proxy when self._transparent_SM_mode.

Handling newlines

If we jump between states of current moler device it checks what type of newline is defined per state. Since it may happen then local console has different line endings then remote console.

However, as we think about host SM and nested SM, they both share same connection. Nested SM being interactive command runs inside same shell of some device. In most cases they would share same newline. But nested SM should not know from which state it starts (from which console).

So, as a default "newline of current console" should be passed down from hosting SM into nested SM.

It is also possible that nested SM changes mode of that console handling and uses different newline. But in such case nested SM knows it from its internal code.

def _get_newline()

Above analysis shows a need to refactor def _get_newline(). This method depends on property self.current_state that requires refactoring. Moreover, it uses self._newline_chars[state] and we don't want to pollute host SM states&newlines with those of nested SM.

Caution: simple proxing into nested SM without passing "newline of current shell" into construction of nested SM might accidentally change newline of host SM - since default returned from _get_newline() is "\n" (what if host SM was not "\n" when started sub SM?)

_collect_observer_for_state(), _collect_cmd_for_state(), _collect_event_for_state() and related

NO CHANGE NEEDED

These methods return command/event names available for given state - so, they return dict. Names are fully qualified names like moler.cmd.unix.ip_addr.IpAddr used by instance loader.

They are used inside _load_cmdnames_for_state()/_load_eventnames_for_state() which are used by _collect_cmds_for_state_machine()/_collect_events_for_state_machine().

They build those dicts based on _get_available_states() which should return only states of hosting SM and not nested SM nor "virtual" states of form OpenSsl.INTERACTIVE.

_get_available_states()

NO CHANGE NEEDED

It just returns self.states which is modified by _update_SM_states() called from _add_transitions() which is used by derived classes (like ProxyPc, UnixLocal) to create states and transitions based on SM configuration.
That means, it will work since we are taking here no nested states nor "virtual" ones.

get_cmd(), get_event(), get_observer()

Main focus is on get_cmd but same code modification should be made for get_event - for parity and because nested SM may also posses some events.

We need to analyse 2 cases, starting from simpler one:

1. nested SM is already running and we want to fire subcommand of interactive command

In such case hosting SM doesn't know commands of nested SM.

get_observer() uses _load_cmdnames_for_state() (building dict described above) and then builds command object of specified name. So, for example for IpAddr it searches its fully qualified name moler.cmd.unix.ip_addr.IpAddr inside dict created by _load_cmdnames_for_state().

F.ex.: hosting SM won't find fully qualified name for s_client since it is known only to nested SM of openssl device.

Concluding hosting SM get_cmd() should proxy towards get_cmd() of nestes SM in case hosting SM is in transparent mode.

2. hosting SM is about to start nested SM via firing interactive command like openssl

That should run whole machinery of:

  1. creating nested SM
  2. changing hosting SM mode to transparent
    Moreover, we don't want command - even interactive command - to carry reference to enclosing device. That is to prevent against circular dependency (device has commands, command has device).

Interactive command should be new type of generic command derived from CommandChangingPrompt. Proposed class name CommandCreatingSM. Constructor of such class should get:

  1. connection of host device
  2. current prompt of host device
  3. current newline of host device
  4. closure method that:
    1. is able to create nested device (like openssl device)
      1. device name may come from interactive command by convention (openssl cmd --> Openssl dev) or
      2. come directly from command (new API) openssl_cmd.started_devname() (return "OpenSsl")
    2. changes hosting device mode into transparent
    3. can check uniqueness of prompts of nested device in comparison to hosting device

injected closure

Thanks to closure parameter the command won't hold device directly but will have opaque method to just call when command succeedes. In reality reference to hosting device and newly created nested device resides inside that closure method. However, interactive command can do nothing with it besides calling it so, we have narrow dependency in the form of well known pattern "dependency injection". That should also help in testing.

That closure may also have a form of context manager. That way its entry would be responsible for all the stuff related to making host SM transparent and exit would be responsible for restoring hosting SM into normal mode. Thanks to context manager as generator we can have setup/cleanup code side-by-side. That would help in maintenance.

goto_state()

State in current moler SM implementation may be changed in 2 ways:

  1. unconditionally
    1. via on_connection_made() into CONNECTED state
    2. via on_connection_lost() into NOT_CONNECTED state
    3. via prompt observers
  2. by jumping over intermediate states towards target one - using goto_state()

If hosting SM is in transparent mode it means it is in "nested SM as my one big state".
So, in transparent mode - any state change caused either by prompt observers, connection callcacks or via jumps inside goto_state() should:

  1. signal to nested SM "finalize yourself"
  2. await till nested SM is done
  3. proceed with state change on hosting SM level

However, there might appear two implementations for "finalize yourself". Hard and soft one. It comes from nature of state change inside hosting SM:

  1. connection callbacks and prompts observers detect that prompt has changed. It means something really happened. Maybe there was connection drop (either first connection or next-hop connection). So, that is hard fact and state machine MUST sync to it. If host SM was transparent, nested SM was in some state. Finding hard-state-change means for nested state "you are already done or killed or interrupted". So, it must perform quickly hard shut down (stopping yet running commands/events, putting SM into END state, etc).
  2. goto_state() is different. It is request "take me from current state towards another one; you have timeout time to do it". So, it may wait for active command inside nested SM to complete. Another words it is SOFT shut down request.

Implementation may lead us to same code. That would be great to not have a need to differentiate the two cases - lower maintenance cost. However, above is a reasoning that justifies two different code pieces.

END state inside nested SM

Maybe having END state in nested SM is a way to go for implementing "shut down". It is not adding any new API this way. Such requirement for all nested SMs (or maybe for all SMs) builds chance for polymorphic behavior. Hosting SM doesn't need to know type of nested SM. It just knows that whenever it wants to shut down nested SM it just says "goto END state".

device removal

All devices have API def register_device_removal_callback(self, callback) and def remove(self). Registration is used by DeviceFactory. Factory performs caching - when you call get_device(name='DEV1') next time and 'DEV1' has been already created, it is returned to you without recreation. This way we keep devices "all time open" to save connection establishment time. Of cause you can close that device via DeviceFactory.remove_device(name='DEV1') and/or uncache it via DeviceFactory.forget_device_handler().

The consequence of caching is that device with same name point to same device object inside factory cache. If you want to use same name to refer another device you need to uncache it first.

remove_device() finds cached device of given name and performs dev.remove() on it. That allows device to do its own cleanup.

dev.register_device_removal_callback() is used by DeviceFactory during device creation. A a callback it sets forget_device_handler. So, whenever device is removed it is also uncached.
Moreover, removal device callback may be called multiple times - it can store multiple callbacks to be called when device is removed.
That functionality may also be used for hosting/nested SMs cleanup. Besides uncaching device we may also do some "nesting SM cleanup" just by registering another callback.

dev.remove() just calls all callbacks that have been registered on device via dev.register_device_removal_callback()

TextualDevice.remove()

TextualDevice overwrites basic functionality of AbstractDevice depicted above. Besides calling all registered remove-callbacks it also performs state and connection cleanup:

  1. if device has established connection
    1. it puts device into NOT_CONNECTED state using goto_state()
    2. it closes connection

WE DON'T WANT it for nested device. We don't wan't returning from nested SM to cause closing connection on hosting SM (they both share same connection). Besides it, nested SM has no NOT_CONNECTED state. It has END state.

So, dev.register_device_removal_callback() requires refactoring to catch up with transparent mode device.