fedora-iot / greenboot

Generic Health Checking Framework for systemd

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RFE: make `01_update_platforms_check.sh` more intelligent/robust

miabbott opened this issue · comments

An ostree remote config may have a url= parameter and a contenturl= parameter included in the config. When the contenturl= parameter is present, the ostree client will fetch content from that resource, but will fetch metadata from the resource specified by the url= parameter.

Currently, the Fedora IoT ostree infrastructure is configured in a way that doing curl -L https://ostree.fedoraproject.org/iot (as specified in the url= parameter) returns an HTTP 403. But curl -L https://ostree.fedoraproject.org/iot/config returns HTTP 200.

Along similar lines, curl -L https://ostree.fedoraproject.org/iot/mirrorlist returns HTTP 200. And substituting the CloudFront hostname from the mirror list: curl -L https://d2ju0wfl996cmc.cloudfront.net/config also returns HTTP 200.

The intent is to make the script more intelligent to test for actual content availability depending on how the ostree remote config is populated. If there is a contenturl= parameter, the script should check that fetching config asset from both the url= and contenturl= parameters to validate both more completely. In the absence of the contenturl= parameter, the script should only check the url= parameter.

See also: https://issues.redhat.com/browse/THEEDGE-3108

See also #71

Another complexity to be aware of: some Red Hat Edge systems will have the update URLs protected by entitlement certificates, so bare curl'ing of the the URLs will always fail.

Any curl operations for those kinds of URLs will need to make use of curl --cacert <RH CA cert> --cert <client entitlement cert> --key <client entitlement key> ...

Or we come up with something else?

I think all these are valid, easy additions that we can make (easily) and since the rewrite will support bare bash scripts like this I think it's worth doing

Note to self: port this to Jira as we don't have the jira label in this repo

Another complexity to be aware of: some Red Hat Edge systems will have the update URLs protected by entitlement certificates, so bare curl'ing of the the URLs will always fail.

Any curl operations for those kinds of URLs will need to make use of curl --cacert <RH CA cert> --cert <client entitlement cert> --key <client entitlement key> ...

The end point should still be reachable even if auth fails. The idea of the test was to ensure the network stack was working, routing, DNS, HTTP etc.

The end point should still be reachable even if auth fails. The idea of the test was to ensure the network stack was working, routing, DNS, HTTP etc.

We need to define "reachable" in this case.

If we curl the endpoint and we get a TCP connection established, does that meet the qualifications of reachable?

Or do we want to say that reachable == HTTP 200?

The original version of the script (e1695a6) seems to imply that we care more about a successful HTTP code than a successful TCP connection.

But in the testing of the endpoints provided by the Fleet Management system, we get HTTP 401 if no auth is provided.

$ curl -I https://cert.console.redhat.com/api/edge/v1/storage/update-repos/962                                                       
HTTP/1.1 401 Unauthorized                                                                                                                                      
Server: openresty                                                              
Content-Type: text/plain                                                       
x-rh-insights-request-id: 71cdea3450c54aa7b2523cc67f0c9b32                                                                                                                                                                                                                                                                    
x-content-type-options: nosniff                                                
Content-Length: 0                                                              
Date: Thu, 23 Mar 2023 20:29:09 GMT                                            
Connection: keep-alive                                                                                                                                                                                                                                                                                                        
Set-Cookie: b3e2e456866f84f3604b36899c8be8b3=aead60ca13504b9091f820163e933ba7; path=/; HttpOnly; Secure; SameSite=None                                         
x-rh-edge-request-id: 3706fcc6                                                 
x-rh-edge-reference-id: 0.5e4e4e68.1679603349.3706fcc6                         
x-rh-edge-cache-status: Miss from child, Miss from parent                      
X-Frame-Options: SAMEORIGIN                                                    
Strict-Transport-Security: max-age=31536000; includeSubDomains

And even if auth is provided, we get HTTP 405 because the HTTP GET method is not supported:

$ sudo curl -k --cert /etc/pki/consumer/cert.pem --key /etc/pki/consumer/key.pem --cacert /etc/rhsm/ca/redhat-uep.pem -I https://cert.console.redhat.com/api/edge/v1/storage/update-repos/962                                                                                                     
HTTP/1.1 405 Method Not Allowed                                                
Server: openresty                                                              
x-rh-insights-request-id: 9ca74eaba2584447bd6aef473fab2250
x-rh-insights-request-id: 9ca74eaba2584447bd6aef473fab2250
x-content-type-options: nosniff                                                
Cache-Control: private                                                         
Content-Length: 0                                                              
Date: Thu, 23 Mar 2023 20:34:39 GMT                                            
Connection: keep-alive                                                         
Set-Cookie: b3e2e456866f84f3604b36899c8be8b3=aead60ca13504b9091f820163e933ba7; path=/; HttpOnly; Secure; SameSite=None                                         
x-rh-edge-request-id: 371dc2c8                                                 
x-rh-edge-reference-id: 0.5e4e4e68.1679603679.371dc2c8
x-rh-edge-cache-status: Miss from child, Miss from parent                                                                                                                                                                                                                                                                     
X-Frame-Options: SAMEORIGIN                                                    
Strict-Transport-Security: max-age=31536000; includeSubDomains

So in both of those examples, the endpoint is "reachable" in a TCP sense, but unreachable given the current mechanism that we test with.

We '"reachable" in a TCP sense' is probably enough to no roll back as we know routing and DNS is working so a system would accept remote connects like ssh to further debug, although a 401/405 could also stop us receiving a further update. It's likely hard to test any further without adding some form of ostree cmd to check if an update is available and that may (or may not) be going further than we need for a basic check.

you could just run a rpm-ostree refresh-md to just check if the metadata can be refreshed. That way it is just checking the currently used mirror. Might add a few seconds tho.

That 01_update_platforms_check.sh script fails on a fresh Fedora IoT 38 install with:

grep: /etc/ostree/remotes.d/*: No such file or directory
No update platforms found, this can be a mistake

There are no files in /etc/ostree/remotes.d/. Can this script be re-written to use ostree remote list and ostree remote show-url <name> instead? The output of that for me is:

$ ostree remote list
fedora-iot

$ ostree remote show-url fedora-iot
https://ostree.fedoraproject.org/iot

I'm running into this as well; I believe my first fedora-iot installation was either f34 or f35 so this configuration might be out of date. I have the following file lingering around:

# cat /etc/ostree/remotes.d/fedora-iot.conf
[remote "fedora-iot"]
url=https://ostree.fedoraproject.org/iot/
gpg-verify=true
gpgkeypath=/etc/pki/rpm-gpg/
contenturl=mirrorlist=https://ostree.fedoraproject.org/iot/mirrorlist

you could just run a rpm-ostree refresh-md to just check if the metadata can be refreshed. That way it is just checking the currently used mirror. Might add a few seconds tho.

I've tried that, to no luck (or discernable change).

[root@k4 wanted.d]# rpm-ostree refresh-md
Enabled rpm-md repositories: fedora-cisco-openh264 updates fedora
Updating metadata for 'updates'... done
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2023-03-14T10:56:46Z solvables: 4
rpm-md repo 'updates'; generated: 2023-09-20T01:22:56Z solvables: 20815
rpm-md repo 'fedora' (cached); generated: 2023-04-13T20:37:10Z solvables: 69222

[root@k4 wanted.d]# ls /etc/ostree/remotes.d/fedora-iot.conf
/etc/ostree/remotes.d/fedora-iot.conf

[root@k4 wanted.d]# cat /etc/ostree/remotes.d/fedora-iot.conf
[remote "fedora-iot"]
url=https://ostree.fedoraproject.org/iot/
gpg-verify=true
gpgkeypath=/etc/pki/rpm-gpg/
contenturl=mirrorlist=https://ostree.fedoraproject.org/iot/mirrorlist

I'm a bit confused by what the url value in the remote entry is actually doing. If it's returning a 403 but updates are still working, why are we attempting to query a forbidden URL? Is there any harm in changing the url to test the reachability of the mirror list?


Edit - I traced the source of the failure to the trailing slash in the url (see this comment); this can be bypassed by dropping that slash.

That 01_update_platforms_check.sh script fails on a fresh Fedora IoT 38 install with:

grep: /etc/ostree/remotes.d/*: No such file or directory
No update platforms found, this can be a mistake

There are no files in /etc/ostree/remotes.d/. Can this script be re-written to use ostree remote list and ostree remote show-url <name> instead? The output of that for me is:

$ ostree remote list
fedora-iot

$ ostree remote show-url fedora-iot
https://ostree.fedoraproject.org/iot

FWIW my experimentation indicated that /etc/ostree/remotes.d/fedora-iot.conf is responsible for setting the output of ostree remote show-url fedora-iot:

[root@k4 ~]# ostree remote show-url fedora-iot
https://ostree.fedoraproject.org/iot/

[root@k4 ~]# vi /etc/ostree/remotes.d/fedora-iot.conf # add trailing slash

[root@k4 ~]# ostree remote show-url fedora-iot
https://ostree.fedoraproject.org/iot

[root@k4 ~]# mv /etc/ostree/remotes.d/fedora-iot.conf /root
[root@k4 ~]# ostree remote show-url fedora-iot
error: Remote "fedora-iot" not found

Yeah that's odd. I did a recursive grep over the entire file system and could not find any file with the content https://ostree.fedoraproject.org/iot, yet ostree remote show-url fedora-iot shows exactly that on my machine. I'm guessing this remote is somehow backed into ostree with f38? Maybe only on aarch64 using that raw variant from https://fedoraproject.org/iot/download/?

Anyway, I've opened #116 to use ostree remote list to fetch the list of remotes which should work on all versions no matter where they get their information from.

As @adrienthebo mentioned, using https://ostree.fedoraproject.org/iot (without the slash) works fine because the script already checks for 3xx responses