eth-cscs / stackinator

Home Page:https://eth-cscs.github.io/stackinator/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

external packages in packages.yaml recipe are not added to the generated spack.yaml

dganellari opened this issue · comments

I am trying to create a recipe for the AMD stack on Hohgant. A working version of the generated spack.yaml would be:

spack:                                                                               
  include:
  - compilers.yaml
  - config.yaml
  view: false
  concretizer:
    unify: true
    reuse: false
  specs:
  - kokkos+rocm std=17 amdgpu_target=gfx90a ^hip@5.2.4 ^llvm-amdgpu@5.2.4
  - kokkos-kernels
  - cray-mpich-binary+rocm ^hip@5.2.4
  - rocprim@5.2.4 ^hip@5.2.4
  - fftw +mpi
  - hdf5 +mpi
  - openblas
  - boost
  packages:
    all:
      variants: std=17 amdgpu_target=gfx90a amdgpu_target_sram_ecc=gfx90a target=zen3
      compiler: [gcc@11]
    mpi:
      require: cray-mpich-binary
    hip:
      buildable: false
      externals:
      - spec: hip@5.2.4
        prefix: /opt/rocm
    rocm-cmake:
      buildable: false
      externals:
      - spec: rocm-cmake@5.2.4
        prefix: /opt/rocm/
    rocminfo:
      buildable: false                                                               
      externals:                                                                     
      - spec: rocminfo@5.2.4                                                         
        prefix: /opt/rocm/                                                           
    rocprim:                                                                         
      buildable: false                                                               
      externals:                                                                     
      - spec: rocprim@5.2.4                                                          
        prefix: /opt/rocm/rocprim                                                    
    llvm-amdgpu:                                                                     
      buildable: false                                                               
      externals:                                                                     
      - spec: llvm-amdgpu@5.2.4                                                      
        prefix: /opt/rocm                                                            
    hsa-rocr-dev:                                                                    
      buildable: false                                                               
      externals:                                                                     
      - spec: hsa-rocr-dev@5.2.4                                                     
        prefix: /opt/rocm                                                            

Manually modifying the generated packages/gcc-env/spack.yaml file to the above model I am able to build a working AMD stack and create the squash-fs file. The rest follows the receipe in test/base-amdgpu.

However I was not able to correctly modify the recipe to generate the right content as described above. In this config file I am using external packages and I tried to put them into different positions into the packages.yaml file of the recipe but I never got it correctly and they always got discarded. The file that I ended up creating is the following:

packages:
    gcc-env:
      compiler:
          toolchain: gcc
          spec: gcc@11
      unify: true
      specs:
      - kokkos+rocm std=17 amdgpu_target=gfx90a ^hip@5.2.4 ^llvm-amdgpu@5.2.4
      - kokkos-kernels
      - cray-mpich-binary+rocm ^hip@5.2.4
      - fftw +mpi
      - hdf5 +mpi
      - openblas
      - boost
      mpi:
        spec: cray-mpich-binary
        gpu: rocm
      hip:
        buildable: false
        externals:
        - spec: hip@5.2.4
          prefix: /opt/rocm
      rocm-cmake:
        buildable: false
        externals:
        - spec: rocm-cmake@5.2.4
          prefix: /opt/rocm/
      rocminfo:
        buildable: false
        externals:
        - spec: rocminfo@5.2.4
          prefix: /opt/rocm/
      rocprim:
        buildable: false
        externals:                                                           
        - spec: rocprim@5.2.4                                                
          prefix: /opt/rocm/rocprim                                          
      llvm-amdgpu:                                                           
        buildable: false                                                     
        externals:                                                           
        - spec: llvm-amdgpu@5.2.4                                            
          prefix: /opt/rocm                                                  
      hsa-rocr-dev:                                                          
        buildable: false                                                     
        externals:                                                           
        - spec: hsa-rocr-dev@5.2.4                                           
          prefix: /opt/rocm                                                  
    tools:                                                                   
      compiler:                                                              
          toolchain: gcc                                                     
          spec: gcc@11                                                       
      unify: true                                                            
      specs:                                                                 
      - cmake                                                                
      - python@3.10                                                          
      - py-numpy                                                             

It discarded all the HIP related external packages together with the +rocm specification in the cray-mpich-binary package
even though the following was set: gpu: rocm. Same if you use gpu:cuda.

Where would be the correct place for adding those external packages for a correct generation?

The packages.yaml file in the recipe is not the same packages.yaml provided

The following in your packages.yaml will be ignored

      hip:
        buildable: false
        externals:
        - spec: hip@5.2.4
          prefix: /opt/rocm
      rocm-cmake:
        buildable: false
        externals:
        - spec: rocm-cmake@5.2.4
          prefix: /opt/rocm/
      rocminfo:
        buildable: false
        externals:
        - spec: rocminfo@5.2.4
          prefix: /opt/rocm/
      rocprim:
        buildable: false
        externals:                                                           
        - spec: rocprim@5.2.4                                                
          prefix: /opt/rocm/rocprim                                          
      llvm-amdgpu:                                                           
        buildable: false                                                     
        externals:                                                           
        - spec: llvm-amdgpu@5.2.4                                            
          prefix: /opt/rocm                                                  
      hsa-rocr-dev:                                                          
        buildable: false                                                     
        externals:                                                           
        - spec: hsa-rocr-dev@5.2.4                                           
          prefix: /opt/rocm 

There is a fix to better test yaml files for valid input in #17.

The (slightly hacky) way to fix this at the moment is to edit the packages.py for hohgant.

That said, the solution of using ROCM installed as part of the CPE isn't ideal, as we want to be able to use the spack stacks on systems that don't have CPE installed. The following tool could be used to build a separate, stand alone ROCM: https://github.com/PawseySC/rocm-from-source