FEATURE: Add support for jinja2 templating on SAT file
Masber opened this issue · comments
SAT file to deploy clusters is currently a static file and we would like to add support so we could use a jinja2 template features.
An example would be something like:
manta apply cluster -f <SAT file> --session-vars <session vars file>
With SAT file being:
# (C) Copyright 2022-2023 Hewlett Packard Enterprise Development LP
---
schema_version: 1.0.2
configurations:
- name: "{{default.note}}-compute-config-{{default.suffix}}"
layers:
# The gpu_customize_driver_playbook.yml playbook will install GPU driver and
# SDK/toolkit software into the compute boot image if GPU content is available
# in the expected Nexus repo targets. If GPU content has not been uploaded to
# Nexus this play will be skipped automatically. If GPU content is available in
# Nexus but a non-gpu image is wanted this layer can be commented out.
#BEGIN_GPU_SUPPORT
- name: uss-gpu-customize-driver-playbook-{{uss.working_branch}}
playbook: gpu_customize_driver_playbook.yml
product:
name: uss
version: "{{uss.version}}"
branch: "{{uss.working_branch}}"
special_parameters:
ims_require_dkms: true
#END_GPU_SUPPORT
- name: shs-{{default.network_type}}_install-{{slingshot_host_software.working_branch}}
playbook: shs_{{default.network_type}}_install.yml
product:
name: slingshot-host-software
version: "{{slingshot_host_software.version}}"
branch: "{{slingshot_host_software.working_branch}}"
special_parameters:
ims_require_dkms: true
- name: cscs-interfaces
playbook: cscs-interfaces.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/cscs-config-management.git
branch: cscs-23.07.0
- name: cos-compute-{{uss.working_branch}}
playbook: cos-compute.yml
product:
name: uss
version: "{{uss.version}}"
branch: "{{uss.working_branch}}"
special_parameters:
ims_require_dkms: true
# The gpu_customize_net_playbook.yml playbook installs GPU network-dependent
# software and any additional GPU packages needed. The playbook will run by
# default if GPU content is available in Nexus, and will be skipped if not. If
# a non-gpu compute-only image is required this layer can be commented out.
#BEGIN_GPU_SUPPORT
- name: uss-gpu-customize-net-playbook-{{uss.working_branch}}
playbook: gpu_customize_net_playbook.yml
product:
name: uss
version: "{{uss.version}}"
branch: "{{uss.working_branch}}"
special_parameters:
ims_require_dkms: true
#END_GPU_SUPPORT
- name: csm-packages-{{csm.version}}
playbook: csm_packages.yml
product:
name: csm
version: "{{csm.version}}"
- name: csm-diags-compute-{{csm_diags.version}}
playbook: csm-diags-compute.yml
product:
name: csm-diags
version: "{{csm_diags.version}}"
- name: sma-ldms-compute-{{sma.version}}
playbook: sma-ldms-compute.yml
product:
name: sma
version: "{{sma.version}}"
# - name: cpe-pe_deploy-{{cpe.working_branch}}
# playbook: pe_deploy.yml
# product:
# name: cpe
# version: "{{cpe.version}}"
# branch: "cscs-23.07.0"
##BEGIN_SLURM_SUPPORT
# - name: slurm-site-{{slurm.working_branch}}
# playbook: site.yml
# product:
# name: slurm
# version: "{{slurm.version}}"
# branch: "{{slurm.working_branch}}"
##END_SLURM_SUPPORT
- name: cscs
playbook: site.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/cscs-config-management.git
branch: cscs-23.07.0
- name: nomad
playbook: site-client.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/nomad_orchestrator.git
branch: main
- name: cos-compute-last-{{uss.working_branch}}
playbook: cos-compute-last.yml
product:
name: uss
version: "{{uss.version}}"
branch: "{{uss.working_branch}}"
special_parameters:
ims_require_dkms: true
images:
# Uncomment the lines below if ARM images are needed.
#BEGIN_AARCH64_SUPPORT
- name: "{{default.note}}-compute-{{default.suffix}}"
ref_name: compute_image.aarch64
base:
ims:
name: "gracehopper-uss-1.0.0-58-csm-1.5.aarch64-1"
type: image
configuration: "{{default.note}}-compute-config-{{default.suffix}}"
configuration_group_names:
- Compute
- prealps
- santis
#END_AARCH64_SUPPORT
session_templates:
# Uncomment the lines below if ARM session templates are needed.
#BEGIN_AARCH64_SUPPORT
- name: "{{default.note}}-compute-template-{{default.suffix}}"
image:
image_ref: compute_image.aarch64
configuration: "{{default.note}}-compute-config-{{default.suffix}}"
bos_parameters:
boot_sets:
compute:
arch: ARM
kernel_parameters: ip=dhcp quiet ksocklnd.skip_mr_route_setup=1 cxi_core.disable_default_svc=0 spire_join_token=${SPIRE_JOIN_TOKEN}
node_roles_groups:
- Compute
- prealps
- santis
rootfs_provider_passthrough: "dvs:api-gw-service-nmn.local:300:hsn0,nmn0:0"
- name: "{{default.note}}-compute-template-{{default.suffix}}-ramdisk"
image:
image_ref: compute_image.aarch64
configuration: "{{default.note}}-compute-config-{{default.suffix}}"
bos_parameters:
boot_sets:
compute:
arch: ARM
kernel_parameters: ip=dhcp quiet ksocklnd.skip_mr_route_setup=1 cxi_core.disable_default_svc=0 spire_join_token=${SPIRE_JOIN_TOKEN}
node_roles_groups:
- Compute
- prealps
- santis
rootfs_provider_passthrough: "dvs:api-gw-service-nmn.local:300:hsn0,nmn0:1"
#END_AARCH64_SUPPORT
And session vars file being:
---
base_image: "gracehopper-base-cscs-uss-1.0.0-58-csm-1.5.aarch64-shs-2.1.1-64-cos-3.0-aarch64-compute-image-20"
default:
network_type: cassini
note: 'santis'
suffix: 23.11.0-beta.5-9
wlm: slurm
working_branch: "cscs-23.07.0"
slingshot:
version: 2.1.1-894
slingshot-host-software:
version: 2.1.1-64-cos-3.0-aarch64
working_branch: cscs-23.07.0
sma:
version: 1.9.5
uan:
version: 2.7.1
working_branch: cscs-23.07.0
uss:
version: 1.0.0-58-csm-1.5
working_branch: cscs-23.07.0-no-nvhpc
@miguelgila it is not clear to me how the session vars is generated or from where it comes from, is this the schema of this file fixed?
the template has this field csm_diags.version
which does not exists in the sessions var
This particular vars file seems to have been taken from another place in CSM, it lists all the components of what I think is one of their recipes. We can simplify/clean it as much as we want, the only vars needed are what's in the sat yaml file.
Please note that not all the fields in the sat file are templatable, for example the ims recipe or image name. Which is sub-optimal as one would want to have the sat file clean and use vars everywhere. Maybe this is something we can do in manta?
@Masber this is a more realistic variables file:
---
base_image: "gracehopper-base-cscs-uss-1.0.0-58-csm-1.5.aarch64-shs-2.1.1-64-cos-3.0-aarch64-compute-image-20"
default:
network_type: cassini
note: 'santis'
suffix: 23.11.0-beta.5-9
wlm: slurm
working_branch: "cscs-23.07.0"
slingshot:
version: 2.1.1-894
slingshot-host-software:
version: 2.1.1-64-cos-3.0-aarch64
working_branch: cscs-23.07.0
sma:
version: 1.9.5
uan:
version: 2.7.1
working_branch: cscs-23.07.0
uss:
version: 1.0.0-58-csm-1.5
working_branch: cscs-23.07.0-no-nvhpc
As you can see some of those fields have been copied from the same location as the previous one, but other ones like base_image
are completely arbitrary and created by us.
example of a SAT template file:
❯ cat sat-file/sat_file-zinal-cta-client-template.yaml
configurations:
- name: "{{ config.name }}-{{ config.version }}"
layers:
- name: ss11
playbook: shs_cassini_install.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/slingshot-host-software-config-management.git
branch: integration
- name: cos
playbook: site.yml
product:
name: cos
version: 2.3.101
branch: integration
- name: cscs
playbook: site.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/cscs-config-management.git
branch: cscs-23.06.0
- name: nomad-orchestrator
playbook: site-client.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/nomad_orchestrator.git
branch: main
images:
- name: zinal-nomad-{{ image.version }}
ims:
is_recipe: false
id: 4bf91021-8d99-4adf-945f-46de2ff50a3d
configuration: "{{ config.name }}-{{ config.version }}"
configuration_group_names:
- Compute
- "{{ hsm.group_name }}"
session_templates:
- name: "{{ bos_st.name }}"
image: zinal-image-v0.5
configuration: "{{ config.name }}-{{ config.version }}"
bos_parameters:
boot_sets:
compute:
kernel_parameters: ip=dhcp quiet spire_join_token=${SPIRE_JOIN_TOKEN}
node_groups:
- "{{ hsm.group_name }}"
And the values file:
❯ cat sat-file/sat_file-zinal-cta-client-values.yaml
---
hsm:
group_name: "zinal_cta"
config:
name: "test-config"
version: "v1.0.0"
image:
version: "v1.0.5"
bos_st:
name: "deploy-cluster-action"
version: "v1.0"
And the result rendered file:
manta a cluster -f sat-file/sat_file-zinal-cta-client-template.yaml -V sat-file/sat_file-zinal-cta-client-values.yaml`
DEBUG SAT file rendered:
:configurations:
- name: "test-config-v1.0.0"
layers:
- name: ss11
playbook: shs_cassini_install.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/slingshot-host-software-config-management.git
branch: integration
- name: cos
playbook: site.yml
product:
name: cos
version: 2.3.101
branch: integration
- name: cscs
playbook: site.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/cscs-config-management.git
branch: cscs-23.06.0
- name: nomad-orchestrator
playbook: site-client.yml
git:
url: https://api-gw-service-nmn.local/vcs/cray/nomad_orchestrator.git
branch: main
images:
- name: zinal-nomad-v1.0.5
ims:
is_recipe: false
id: 4bf91021-8d99-4adf-945f-46de2ff50a3d
configuration: "test-config-v1.0.0"
configuration_group_names:
- Compute
- "zinal_cta"
session_templates:
- name: "deploy-cluster-action"
image: zinal-image-v0.5
configuration: "test-config-v1.0.0"
bos_parameters:
boot_sets:
compute:
kernel_parameters: ip=dhcp quiet spire_join_token=${SPIRE_JOIN_TOKEN}
node_groups:
- "zinal_cta"
implemented in version v1.22.9