Traceback when there is no OpenStack data source
rjschwei opened this issue · comments
Bug report
The ds-identify script guesses that we may be in an OpenStack environment on non x86_64 architectures, in this case aarch 64 [1]. This then triggers the enablement of cloud-init services and as such the execution of the Python code. When no data source if found by the OpenStack data source implementation an exception trickles to the top causing a traceback.
2024-04-05 14:34:22,076 - util.py[DEBUG]: No active metadata service found
Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 158, in _get_data
results = util.log_time(
^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/cloudinit/util.py", line 2833, in log_time
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 212, in _crawl_metadata
raise sources.InvalidMetaDataException(
cloudinit.sources.InvalidMetaDataException: No active metadata service found
and
2024-04-05 14:34:22,107 - util.py[DEBUG]: failed stage init-local
Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/cloudinit/cmd/main.py", line 385, in main_init
init.fetch(existing=existing)
File "/usr/lib/python3.11/site-packages/cloudinit/stages.py", line 466, in fetch
return self._get_data_source(existing=existing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/cloudinit/stages.py", line 357, in _get_data_source
(ds, dsname) = sources.find_source(
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/cloudinit/sources/__init__.py", line 1032, in find_source
raise DataSourceNotFoundException(msg)
cloudinit.sources.DataSourceNotFoundException: Did not find any data source, searched classes: (DataSourceOpenStackLocal)
The exception should be handled and no traceback should be generated.
[1] https://github.com/canonical/cloud-init/blob/main/tools/ds-identify#L1370
Steps to reproduce the problem
Run a VM on aarch64 with cloud-init default config and no config drive.
Environment details
- Cloud-init version: 23.3
- Operating System Distribution: SL Micro
- Cloud provider, platform or installer type: Not running in a cloud enviroment
cloud-init logs
I think the issue occurs here https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceOpenStack.py#L159 but the call to log_time
is inside a try-except
block and the exception is supposed to handle InvalidMetaDataException
which is raised by _crawl_metadata
with the message No active metadata service found
which is in the log. So I do not understand why we would still end up with the traceback.
Ugh, what a mess. This should probably not be the default behavior.
This then triggers the enablement of cloud-init services and as such the execution of the Python code.
Auto-enabling on all non-x86 is really not good. This bug is an example of why this permissive optimism was a bad default.
The only other users of DS_MAYBE
in cloud-init, AltCloud and Ec2, only occur in much more limited environments: after positive match on a DMI value or when it is explicitly enabled by a configuration value, respectively.
The more that I think about this the more I think that it was a mistake to try to "just work" for openstack on other architectures without a positive signal.
I'm talking with some openstack folks in the meantime to try to get better openstack support for cloud-init on a few architectures, but I think we should consider making this non-default in a an upcoming cloud-init release. Users that want to use cloud-init on non-x86 can always select openstack in cloud.cfg or in their kernel commandline. Breaking users on some arches just so that other users on those same arches don't have to set a configuration value seems like a poor tradeoff - especially when the tradeoff is caused by a shortcoming of the cloud. Perhaps we should just try to fix openstack instead of depend on broken hacks like this. I think we can get openstack to pass DMI data on a few more arches than it already does, or alternatively it could probably even set the datasource in the kernel commandline.
Related bug report: