Can't build rocky9 AMIs due to hardcoded rocky8 URL/9.4 lustre issues
gbts opened this issue · comments
This line in the rocky cookbook here: https://github.com/aws/aws-parallelcluster-cookbook/blob/16a081940c41cb3bb42cb2c1ddf8de3c059af788/cookbooks/aws-parallelcluster-platform/resources/install_packages/install_packages_rocky8.rb#L36 looks like a fallback for older kernel versions where the kernel headers are not available in the main repo. It instead tries to fetch kernel-devel
directly from the rocky vault for the BaseOS repo. But since kernel-devel
was moved to the AppStream repo in RHEL/Rocky 9, that URL will just hit a 404.
Enabling UpdateOsPackages
in the ImageBuilder template fixes it but now it will upgrade the instance to 9.4, which fails further down when trying to build the kernel module for lustre (presumably because there's no support for the 9.4 kernel yet). So currently all rocky9 builds seem to be failing.
Thank you for taking a look at this @enrico-usai - for the record, before the 9.4 drivers were released I tried running with a patched cookbook that had the correct 9.3 URL, but that didn't work very well. Setting UpdateOsPackages=false
was not enough to keep rocky from updating to 9.4, presumably because the 9.3 repos were empty by that time so attempting to install any package upgraded the whole system. I had some luck version-locking the kernel packages and pointing yum to the 9.3 vault repos, but in the meantime the 9.4 drivers were out so it didn't seem worth the effort anymore.
Anyway, the 9.4 builds work just fine now, so from my point of view this is resolved now
Thanks @gbts for sharing your approach.
We tracked internally the need to improve the cookbook code to be more robust and avoid this kind of issues in future.