canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization

Home Page:https://cloud-init.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cloud-init-23.4 cannot read "- Azure" datasource_list format

zhaohuijuan opened this issue · comments

Bug report

In Image Builder images, the datasource_list in /etc/cloud/cloud.cfg.d/91-azure_datasource.cfg is:

datasource_list:
- Azure

which cannot be parsed by cloud-init.

2024-01-18 10:27:12,433 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.rhel.Distro'>
2024-01-18 10:27:12,441 - {}init{}.py[DEBUG]: Looking for data source in: [\{'datasource_list': None}, 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM', 'NETWORK']
2024-01-18 10:27:12,441 - util.py[WARNING]: failed stage init
2024-01-18 10:27:12,441 - util.py[DEBUG]: failed stage init
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cloudinit/cmd/main.py", line 781, in status_wrapper
ret = functor(name, args)
File "/usr/lib/python3.6/site-packages/cloudinit/cmd/main.py", line 394, in main_init
init.fetch(existing=existing)
File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 493, in fetch
return self.get_data_source(existing=existing)
File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 367, in get_data_source
self.reporter,
File "/usr/lib/python3.6/site-packages/cloudinit/sources/
{
}init{}.py", line 1001, in find_source
ds_list = list_sources(cfg_list, ds_deps, pkg_list)
File "/usr/lib/python3.6/site-packages/cloudinit/sources/{}init{}.py", line 1047, in list_sources
ds_name = importer.match_case_insensitive_module_name(ds)
File "/usr/lib/python3.6/site-packages/cloudinit/importer.py", line 40, in match_case_insensitive_module_name
if "nocloud-net" == mod_name.lower():
AttributeError: 'dict' object has no attribute 'lower'

Steps to reproduce the problem

  1. Create a VM on Azure, change the /etc/cloud/cloud.cfg.d/91-azure_datasource.cfg to:

datasource:
Azure:
apply_network_config: false
datasource_list:
- Azure
2. Clean cloud-init and reboot VM

$sudo cloud-init clean
$sudo reboot

Expected results
cloud-init service can start successfully.

Actual results
cloud-init service fails to start(logs are in the description)
If change the

datasource_list:
- Azure
to:

datasource_list: [ Azure ]
Then cloud-init clean && reboot, the cloud-init service and start successfully.

Environment details

  • Cloud-init version: cloud-init-23.4
  • Operating System Distribution: RHEL
  • Cloud provider, platform or installer type: Azure

cloud-init logs

  1. $ sudo cat /etc/cloud/cloud.cfg.d/91-azure_datasource.cfg
    datasource_list:
    - Azure

  2. cloud-init.log

2024-01-18 10:27:12,433 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.rhel.Distro'>
2024-01-18 10:27:12,441 - {}init{}.py[DEBUG]: Looking for data source in: [\{'datasource_list': None}, 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM', 'NETWORK']
2024-01-18 10:27:12,441 - util.py[WARNING]: failed stage init
2024-01-18 10:27:12,441 - util.py[DEBUG]: failed stage init
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cloudinit/cmd/main.py", line 781, in status_wrapper
ret = functor(name, args)
File "/usr/lib/python3.6/site-packages/cloudinit/cmd/main.py", line 394, in main_init
init.fetch(existing=existing)
File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 493, in fetch
return self.get_data_source(existing=existing)
File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 367, in get_data_source
self.reporter,
File "/usr/lib/python3.6/site-packages/cloudinit/sources/
{
}init{}.py", line 1001, in find_source
ds_list = list_sources(cfg_list, ds_deps, pkg_list)
File "/usr/lib/python3.6/site-packages/cloudinit/sources/{}init{}.py", line 1047, in list_sources
ds_name = importer.match_case_insensitive_module_name(ds)
File "/usr/lib/python3.6/site-packages/cloudinit/importer.py", line 40, in match_case_insensitive_module_name
if "nocloud-net" == mod_name.lower():
AttributeError: 'dict' object has no attribute 'lower'

Additional info:

  1. No such issue in cloud-init-23.1.1, just met this issue after rebase to 23.4, so could you please help to address which commit did this change between 23.1.1 and 23.4?
  2. RHEL was using ds-identify before 23.1.1
  3. I am afraid users who were using the format "- Azure" in config file will meet this issue after upgrade, so could you please help to check if there is a way to be compatible with this format? Thanks

No such issue in cloud-init-23.1.1

Could you please attach the complete log file of an instance that successfully booted with this configuration with 23.1.1? I don't think that this would have worked then either.

Per discussion in irc with Ani, I believe that Azure images would have still failed but in a different way.

Also mentioned during the irc conversation is that the configuration you provided is documented in the docs as not working, and has been documented in the configuration file as not working for 3 years. So even if you provide proof that this ever worked with ds-identify, I suspect we're still going to close this as not a bug, since this is - and was (in pre-23.1.1) expected and documented behavior.

No such issue in cloud-init-23.1.1

Could you please attach the complete log file of an instance that successfully booted with this configuration with 23.1.1? I don't think that this would have worked then either.

Per discussion in irc with Ani, I believe that Azure images would have still failed but in a different way.

cloud-init.log

Also mentioned during the irc conversation is that the configuration you provided is documented in the docs as not working, and has been documented in the configuration file as not working for 3 years. So even if you provide proof that this ever worked with ds-identify, I suspect we're still going to close this as not a bug, since this is - and was (in pre-23.1.1) expected and documented behavior.

I see, just would like to confirm which commit causes this change after 23.1.1, and is there any way to be compatible with this format?

meanwhile, could we do something like this so that there is no ugly crash?

diff --git a/cloudinit/sources/__init__.py b/cloudinit/sources/__init__.py
index c207b5ed..24d04f97 100644
--- a/cloudinit/sources/__init__.py
+++ b/cloudinit/sources/__init__.py
@@ -1044,6 +1044,14 @@ def list_sources(cfg_list, depends, pkg_list):
     )
 
     for ds in cfg_list:
+        if not isinstance(ds, str):
+            LOG.warning(
+                "Ignoring data source %s as its not a string."
+                " Did you specify datasource_list using the correct"
+                " syntax and formatting?",
+                ds,
+            )
+            continue
         ds_name = importer.match_case_insensitive_module_name(ds)
         m_locs, _looked_locs = importer.find_module(
             ds_name, pkg_list, ["get_datasource_list"]

I've reproduced the issue, this is a problem in ds-identify:

# cat /run/cloud-init/cloud.cfg 
datasource_list: [ datasource_list:, None ]

Just to be clear though, what I said before stands: the following configuration is not - and was not ever supported. It just previously failed by ignoring the configuration (and then ds-identify would previously correctly identify the Azure datasource).

$ sudo cat /etc/cloud/cloud.cfg.d/91-azure_datasource.cfg
datasource_list:
- Azure

This issue was introduced by #4327.

As a temporary workaround, I recommend not using an invalid configuration. We will work to get this fix into the latest upstream cloud-init release.

Thanks for reporting this @zhaohuijuan and @ani-sinha.

We're on the fence about whether this is actually a bug - on one hand, this configuration was never a valid configuration, but on the other hand it previously was ignored silently and didn't prevent users from using cloud-init - so long as ds-identify could figure out the actual datasource.

I've queued up a new upstream release since the sentiment in my PR to fix the issue is that we might want to fail loudly someday about providing invalid configurations like this, but the PR that introduced this change in behavior isn't the right way to do it since, as you noticed, the logs don't make it obvious that this is caused by a broken configuration and I discovered that it also introduced even more broken behavior related to YAML parsing being incorrect.

@holmanb Thanks for the debug and clarify the issue. The fix makes sense to us.

You reverted the patch that broke it, after discussion with @ani-sinha , we will backport the revert downstream as well while the config gets fixed.

Thanks @holmanb for identifying and reverting the change that introduced the behavior change and some additional issues.