Refactor Environment Configuration

Question

Refactor Environment Configuration

PhilippvK opened this issue 2 years ago · comments

Philipp van Kempen commented 2 years ago

I came up with a new sturcture for the environment.yml files (See first comment) and would like to discuss it in this context.

Philipp van Kempen · Answer 1 · Tue Mar 08 2022 00:14:55 GMT+0800 (China Standard Time)

This would be my idea, with the the following major changes:

Remove most of the hierachy because it was not used in the end (frameworks and backend are now listed separately)
Merge all features into a single list. We either allow them or not and do not assign them to different backends/frameworks/targets
The information on which components are supported/enabled by the environment (used during mlonmcu setup) are now managed as a list under supported:
Default backends/targets/frontends/... are now listed under use: (Will be overwritten if specified on the command line instead)

---
home: "{{ home_dir }}"
logging:
  ... # No changes here
paths:
  ... # No changes here
repos:
  ... # No changes here
frameworks:
  supported:
    - tflm
    - tvm
backend:
  supported:
    # - tflmc
    - tflmi
    - tvmaot
    - tvmrt
    - tvmcg
  use:
    - tvmaot
frontends:
  supported:
    - tflite
    # - packed
    # - onnx
  use:
    - tflite
platforms:
  supported:
    - mlif
    - espidf
  use:
    - mlif
    - espidf
targets:
  supported:
    - etiss_pulpino  # mlif
    - spike  # mlif
    - ovpsim  # mlif
    - host_x86  # mlif
    - corstone300  # mlif
    # espidf targets do not need to be listed here
  use:
    - etiss_pulpino
features:
  supported:
    - gdbserver
    - etissdbg
    - vext
    - trace
    - disable_legalize
    - debug_arena
    - unpacked_api
    - usmp
    - cmsisnn
    - muriscvnn
    # - cmsisnnbyoc
    # - packing
    # - packed
    # - ethosu
  use: []
postprocesses:
  supported:
    - detailed_cycles
    - average_cycles
    - filter_cols
    - features2cols
    - config2cols
    - bytes2kb
    - visualize
  use: []
vars:
  # tvm.make_tool: "ninja"
  llvm.version: "11.1.0"
  runs_per_stage: true
  riscv_gcc.dl_url: "https://syncandshare.lrz.de/dl/fi89JQF89pEiWwP6aQh7cM4H/rv32gcv.tar.xz"

Philipp van Kempen · Answer 2 · Tue Mar 08 2022 00:15:08 GMT+0800 (China Standard Time)

@rafzi What do your thin about this proposal?

Philipp van Kempen · Answer 3 · Tue Mar 08 2022 00:21:57 GMT+0800 (China Standard Time)

One this which is probably missing right now are "wildcards", which would make it easier to just enable all supported components without copying the above list from supported: to use: and to get rid of the need to keep them in sync.

Here are two possible approaches:

Custom wildcard mechanism: (more flexible if we allow regular expressions instead of just *)

backend:
  supported: 
    - tflmi
    - tvmaot
    - tvmrt
    - tvmcg
  use:
    - "*" # Alternative: ALL or maybe also allow regular expressions e.g. "tvm(.*)"

Use YAML references:

backend:
  supported: &supported_backends
    - tflmi
    - tvmaot
    - tvmrt
    - tvmcg
  use: *supported_backends

Philipp van Kempen · Answer 4 · Tue Mar 08 2022 00:25:05 GMT+0800 (China Standard Time)

Another thing we need to clarify is, how we handle platforms whose targets are resolved dynamically instead of being registered manually. Do the need need to be listed in the environment file as well?

Rafael Stahl · Answer 5 · Tue Mar 08 2022 16:23:09 GMT+0800 (China Standard Time)

This change looks good to me.

What do you think about adding a value to components for specifying that they should be used by default?

backend:
  supported:
    - tflmi
    - tvmaot: default
    - tvmrt
    - tvmcg

Philipp van Kempen · Answer 6 · Tue Mar 08 2022 16:26:11 GMT+0800 (China Standard Time)

What do you think about adding a value to components for specifying that they should be used by default?
backend:
  supported:
    - tflmi
    - tvmaot: default
    - tvmrt
    - tvmcg

i think this wouldn’t be valid yaml syntax. And if we would do the following it would look more confusing:

backend:
  supported:
    tflmi:
    tvmaot: default
    tvmrt:
    tvmcg:

Philipp van Kempen · Answer 7 · Tue Mar 08 2022 16:27:04 GMT+0800 (China Standard Time)

In addition there will be scenarios where a value used in use: does not necessarily have to be in supported:.

Rafael Stahl · Answer 8 · Tue Mar 08 2022 17:47:52 GMT+0800 (China Standard Time)

I was not sure about the syntax either, but a couple online validators told me that this would be fine. It might be a bit confusing in the parsing logic, but that seems tolerable.

Can you elaborate on the case where "use" might contain values that are not in "supported"? It is sufficiently rare that we could add a special value like "default_but_not_supported_otherwise"?

That structure seems more intuitive to me.

Philipp van Kempen · Answer 9 · Wed Mar 09 2022 19:26:07 GMT+0800 (China Standard Time)

@rafzi

supported: is used during mlonmcu setup to decide which dependencies are going to be installed or not, e.g. in validate_tensorflow there should be a if context.environment.has_framework("tflm") which would essentially lookup if tflm is part of the supported list. While it might make sense to rename that option to install:, there are also backends such as tflmi which do not have specific dependencies as it is just implemented in python. Those are also in the list as some features might only be installed if context.environment.has_backend("tflmi") is True.
use:, or alternative defaults: defines default options for the respective components if not bing specified on the command line.

Can you elaborate on the case where "use" might contain values that are not in "supported"? It is sufficiently rare that we could add a special value like "default_but_not_supported_otherwise"?

There are two use-cases:

While the targets managed by MLonMCU (currently part of the platform mlif) need to be explicitly registered in the codebase, this is not the case for other platforms. I.e. during the installation of ESP-IDF, a user can choose which target-families should be supported (./install.sh esp32,esp32c3), MLonMCU would then determine the set of supported targets at runtime (using idf.py list-targets) so they do not need to be enabled/listed by the user. While we could require them to provide a list of usable targets, it is not necessary and would therefore be quite useless.
In theory (never tested it tbh) MLonMCU should provide an extension interface to integrate new Backends, Features etc. at runtime via Python. Those features dependencies would not be handled by MLonMCU, so there is not need to add them to supported:/installed:. (We could fore a user to add them there as well but this would be a step more work, e.g. either adding it to the environment.yml file beforehand or executing something like context.environment.backend.supported.append("..."))

Philipp van Kempen · Answer 10 · Wed Mar 09 2022 19:27:46 GMT+0800 (China Standard Time)

For me the following scheme would be completely fine:

backend:
  supported: &supported_backends
    - tflmi
    - tvmaot
    - tvmrt
    - tvmcg
  use: *supported_backends

Do you habe anything special against it?

Rafael Stahl · Answer 11 · Thu Mar 10 2022 17:23:51 GMT+0800 (China Standard Time)

Thanks for elaborating. The naming "install" instead of "supported" would make more sense to me. I don't see a problem when there is nothing to install. The user generally does not know this, so they will list it anyways if they intend to use it and they did not install it themselves.

I don't like your proposed syntax, because it is a lesser known yaml feature and would in typical cases be more verbose and cause duplication. I'd typically only set one backend/frontend/target as default and set some subset of features as default.

To avoid the strange case of

install:
  - abc
  - xyz: default_noinstall

we could instead allow:

install:
  - abc
default: # or more explicit: "default_preinstalled"
  - xyz

Philipp van Kempen · Answer 12 · Thu Mar 10 2022 20:48:44 GMT+0800 (China Standard Time)

@rafzi Unfortunately I do not really understand the usage of default_preinstalled. I do not get how we would use it in-practice...

I'd typically only set one backend/frontend/target as default

It makes sense to have multiple default Platforms as they can be seen as a registry for targets, so if a target name is provided e.g. on the command line, the default platforms will be "ask" if the provide support for the given target name, which has the big advantage that the user does not have to define --platform mlif --platform espidf on the command line to use targets from different platforms in a single session. (The order of these platforms is important because the same target_name might be supported by multiple platforms)
Frameworks would ideally also work like this. Each framework (or "Backend registry") provides a number of backends which will be used to resolve the backends for a given backend name. (However it is unlikely that there would be an overlap in backend names)

Philipp van Kempen · Answer 13 · Thu Mar 10 2022 20:54:07 GMT+0800 (China Standard Time)

What we could for example do is to allow grouping e.g backens when listing them in supported::

backend:
  supported:
    tflm: &tflm_backends
      - tflmi
    tvm: &tvm_backends
      - tvmaot
      - tvmrt
      - tvmcg
  default: *tvm_backends  # Use all tvm backens
  # or use only a single backend
  default:
    - tvmaot

However, I agree that the verbosity would then increase again if we would like to add another default value as we can not merge the groups intuitively.

Philipp van Kempen · Answer 14 · Thu Nov 10 2022 16:00:08 GMT+0800 (China Standard Time)

I will start implementing this myself soon to allow a followup for #51 getting rid of all hardcoded component names.