CloudStack provider: conflicting fetch between HTTP and ConfigDrive userdata

Question

CloudStack provider: conflicting fetch between HTTP and ConfigDrive userdata

mlsorensen opened this issue a year ago · comments

Bug

When a user sets up a CloudStack network such that DHCP is provided by Virtual Router (so a VR exists on the network), but UserData is provided by ConfigDrive, Ignition's CloudStack provider accepts a 404 from an HTTP UserData request to the VR as empty UserData and ignores the ConfigDrive's userdata.

Operating System Version

3374.2.5

Ignition Version

2.14.0

Environment

CloudStack Network, VR for DHCP provider and ConfigDrive for UserData provider

Expected Behavior

Expected the CloudStack ignition provider to treat a 404 or empty userdata from VR as no userdata and continue to try the config-2 labeled configdrive. I'm actually following up with this on the CloudStack side as well, but when I saw the 404 is also confusing the Ignition side I figured that should probably be addressed here too.

Actual Behavior

HTTP userdata, either empty or 404, was accepted as the userdata for the system, configdrive was ignored.

Reproduction Steps

I think this can actually be reproduced outside of cloudstack if

you create a configdrive labeled config-2 that contains a userdata file at /cloudstack/userdata/user_data.txt and attach it to the VM, and
your DHCP server for the VM also has an http service running on port 80. It doesn't need to host any userdata.

Other Information

Attaching screenshots of console for both the empty userdata (GET result: OK) from HTTP and the 404 (GET result: Not Found). You can see that in both cases they are parsed as valid but empty userdata (per the SHA cf83e...), which causes the real configdrive userdata to be ignored.

The real userdata is there on configdrive:

Once the HTTP server is shut down on the VR, it returns a GET error: and reads data from the config-2 drive.

Benjamin Gilbert · Answer 1 · Fri Mar 10 2023 12:26:37 GMT+0800 (China Standard Time)

Thanks for reporting this. The current code retries indefinitely until it either obtains a config, or positive confirmation of no config, from the config drive or metadata service. If we ignored a legitimate 404 from the metadata service, and the config drive never showed up, we'd end up blocking boot indefinitely. So we'll need a way to distinguish between the no-userdata-provided case and the try-the-configdrive-instead case. Do you know if the metadata service provides a way to do that?

Marcus Sorensen · Answer 2 · Sat Mar 11 2023 00:12:35 GMT+0800 (China Standard Time)

Hi @bgilbert - What I can say is that with CloudStack, both the metadata HTTP server and config drive are set up prior to system boot in a preparation stage, they aren't operated upon in parallel with boot and the config iso is not hot plugged. If we get a 404, or don't find a config drive, it isn't going to show up later.

As far as blocking indefinitely if no userdata was provided to the VM - I think maybe that is a risk regardless, as it's possible to create a VM on a network that does not provide userdata services at all. However, if a network's userdata provider is ConfigDrive, barring a bug in the VM orchestration there will always be a config drive. It will still contain a cloudstack/metadata directory with metadata files, but it will not contain a cloudstack/userdata directory. If a network's userdata provider is VirtualRouter, there will always be a fetchable userdata file, even if it is empty. Additionally there is metadata such as http://{router-ip}/latest/meta-data/instance-id regardless of whether or not userdata was provided.

I'm willing to help develop on this and test it out, however I'm having trouble finding a developer guide that will hold my hand enough to get going. I guess if I check out the source code into a VM and build/install it locally there, I can set up userdata scenarios and then perhaps trigger ignition somehow.

Thanks for the help.

Benjamin Gilbert · Answer 3 · Sun Mar 19 2023 16:46:17 GMT+0800 (China Standard Time)

It seems like the existing detection logic for the config drive should be fine, then: if we find a volume with the correct label, we mount it, and it either does or doesn't contain userdata. The problem is with the metadata service, where we need to distinguish between an existent metadata service and a random HTTP server running on the VR. And I think you have the right idea: we should check for the existence of some appropriate cloudstack/metadata item, and if missing, assume the metadata service is invalid rather than treating the cloudstack/userdata 404 as canonical.

The CloudStack provider isn't actively maintained (no test environment) so if you're able to implement this yourself, we'd happily accept a PR. We don't really have a developer guide, but basically:

The relevant function is here.
./build builds Ignition and ./test runs unit tests (of which there are none for cloud providers).
In this case, you don't need to build a new OS image with your modified Ignition, since you're just testing config fetch. You should be able to just sftp a new Ignition binary to a CloudStack instance (running the same distro version as your build machine). Then:
```
sudo rm -f fetched.ign && \
sudo ./ignition -config-cache fetched.ign -log-to-stdout -platform cloudstack -stage fetch
```
Ask here if you have any questions!

For completeness, re the other parts of your comment:

The config ISO may not be hotplugged, but in general we can't/don't assume the kernel will finish enumerating storage devices in any particular amount of time. Enumeration can be slow on large/heavily loaded systems, so Ignition generally keeps retrying, rather than using a timeout and risking misprovisioning if the timeout is too aggressive.

If the VM has neither a metadata service nor a config drive, I'd say blocking indefinitely is reasonable behavior. Ignition requires that some metadata service exists; the instance isn't going to be useful without one.

Enrique Llorente Pastora · Answer 4 · Tue Apr 04 2023 00:21:50 GMT+0800 (China Standard Time)

At hypershift kubevirt provider we are hitting similar issue, there we use the ignition openstack provider over azure and fetch of metadata server returns 404 so config drives are not readed

Benjamin Gilbert · Answer 5 · Wed Apr 05 2023 12:14:33 GMT+0800 (China Standard Time)

Discussed offline with @qinqon. AFAICT the solution for #1574 (comment) is that HyperShift should use the kubevirt provider in KubeVirt instead of the openstack provider.

Enrique Llorente Pastora · Answer 6 · Wed Apr 05 2023 13:24:06 GMT+0800 (China Standard Time)

Discussed offline with @qinqon. AFAICT the solution for #1574 (comment) is that HyperShift should use the kubevirt provider in KubeVirt instead of the openstack provider.

We have tested a image with platform.id=kubevirt and it working fine but we need to wait for coreos/fedora-coreos-tracker#1126 to consume official artifacts.

Marcus Sorensen · Answer 7 · Thu Apr 06 2023 22:37:05 GMT+0800 (China Standard Time)

Thanks for the additional info, @bgilbert @qinqon, I'll take a look. Since it was mentioned there is no active maintenance on this part and no environment/existing tests, I assume there is also no test code I should be adding to for such a change?

Benjamin Gilbert · Answer 8 · Fri Apr 07 2023 01:59:09 GMT+0800 (China Standard Time)

Correct, there isn't. For providers that aren't tested via OS-level end-to-end tests, we're entirely dependent on manual testing.