RFE: make `01_update_platforms_check.sh` more intelligent/robust
miabbott opened this issue · comments
An ostree remote config may have a url=
parameter and a contenturl=
parameter included in the config. When the contenturl=
parameter is present, the ostree client will fetch content from that resource, but will fetch metadata from the resource specified by the url=
parameter.
Currently, the Fedora IoT ostree infrastructure is configured in a way that doing curl -L https://ostree.fedoraproject.org/iot
(as specified in the url=
parameter) returns an HTTP 403. But curl -L https://ostree.fedoraproject.org/iot/config
returns HTTP 200.
Along similar lines, curl -L https://ostree.fedoraproject.org/iot/mirrorlist
returns HTTP 200. And substituting the CloudFront hostname from the mirror list: curl -L https://d2ju0wfl996cmc.cloudfront.net/config
also returns HTTP 200.
The intent is to make the script more intelligent to test for actual content availability depending on how the ostree remote config is populated. If there is a contenturl=
parameter, the script should check that fetching config
asset from both the url=
and contenturl=
parameters to validate both more completely. In the absence of the contenturl=
parameter, the script should only check the url=
parameter.
See also #71
Another complexity to be aware of: some Red Hat Edge systems will have the update URLs protected by entitlement certificates, so bare curl
'ing of the the URLs will always fail.
Any curl
operations for those kinds of URLs will need to make use of curl --cacert <RH CA cert> --cert <client entitlement cert> --key <client entitlement key> ...
Or we come up with something else?
I think all these are valid, easy additions that we can make (easily) and since the rewrite will support bare bash scripts like this I think it's worth doing
Note to self: port this to Jira as we don't have the jira label in this repo
Another complexity to be aware of: some Red Hat Edge systems will have the update URLs protected by entitlement certificates, so bare
curl
'ing of the the URLs will always fail.Any
curl
operations for those kinds of URLs will need to make use ofcurl --cacert <RH CA cert> --cert <client entitlement cert> --key <client entitlement key> ...
The end point should still be reachable even if auth fails. The idea of the test was to ensure the network stack was working, routing, DNS, HTTP etc.
The end point should still be reachable even if auth fails. The idea of the test was to ensure the network stack was working, routing, DNS, HTTP etc.
We need to define "reachable" in this case.
If we curl
the endpoint and we get a TCP connection established, does that meet the qualifications of reachable?
Or do we want to say that reachable == HTTP 200?
The original version of the script (e1695a6) seems to imply that we care more about a successful HTTP code than a successful TCP connection.
But in the testing of the endpoints provided by the Fleet Management system, we get HTTP 401 if no auth is provided.
$ curl -I https://cert.console.redhat.com/api/edge/v1/storage/update-repos/962
HTTP/1.1 401 Unauthorized
Server: openresty
Content-Type: text/plain
x-rh-insights-request-id: 71cdea3450c54aa7b2523cc67f0c9b32
x-content-type-options: nosniff
Content-Length: 0
Date: Thu, 23 Mar 2023 20:29:09 GMT
Connection: keep-alive
Set-Cookie: b3e2e456866f84f3604b36899c8be8b3=aead60ca13504b9091f820163e933ba7; path=/; HttpOnly; Secure; SameSite=None
x-rh-edge-request-id: 3706fcc6
x-rh-edge-reference-id: 0.5e4e4e68.1679603349.3706fcc6
x-rh-edge-cache-status: Miss from child, Miss from parent
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=31536000; includeSubDomains
And even if auth is provided, we get HTTP 405 because the HTTP GET method is not supported:
$ sudo curl -k --cert /etc/pki/consumer/cert.pem --key /etc/pki/consumer/key.pem --cacert /etc/rhsm/ca/redhat-uep.pem -I https://cert.console.redhat.com/api/edge/v1/storage/update-repos/962
HTTP/1.1 405 Method Not Allowed
Server: openresty
x-rh-insights-request-id: 9ca74eaba2584447bd6aef473fab2250
x-rh-insights-request-id: 9ca74eaba2584447bd6aef473fab2250
x-content-type-options: nosniff
Cache-Control: private
Content-Length: 0
Date: Thu, 23 Mar 2023 20:34:39 GMT
Connection: keep-alive
Set-Cookie: b3e2e456866f84f3604b36899c8be8b3=aead60ca13504b9091f820163e933ba7; path=/; HttpOnly; Secure; SameSite=None
x-rh-edge-request-id: 371dc2c8
x-rh-edge-reference-id: 0.5e4e4e68.1679603679.371dc2c8
x-rh-edge-cache-status: Miss from child, Miss from parent
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=31536000; includeSubDomains
So in both of those examples, the endpoint is "reachable" in a TCP sense, but unreachable given the current mechanism that we test with.
We '"reachable" in a TCP sense' is probably enough to no roll back as we know routing and DNS is working so a system would accept remote connects like ssh to further debug, although a 401/405 could also stop us receiving a further update. It's likely hard to test any further without adding some form of ostree cmd to check if an update is available and that may (or may not) be going further than we need for a basic check.
you could just run a rpm-ostree refresh-md
to just check if the metadata can be refreshed. That way it is just checking the currently used mirror. Might add a few seconds tho.
That 01_update_platforms_check.sh
script fails on a fresh Fedora IoT 38 install with:
grep: /etc/ostree/remotes.d/*: No such file or directory
No update platforms found, this can be a mistake
There are no files in /etc/ostree/remotes.d/
. Can this script be re-written to use ostree remote list
and ostree remote show-url <name>
instead? The output of that for me is:
$ ostree remote list
fedora-iot
$ ostree remote show-url fedora-iot
https://ostree.fedoraproject.org/iot
I'm running into this as well; I believe my first fedora-iot installation was either f34 or f35 so this configuration might be out of date. I have the following file lingering around:
# cat /etc/ostree/remotes.d/fedora-iot.conf
[remote "fedora-iot"]
url=https://ostree.fedoraproject.org/iot/
gpg-verify=true
gpgkeypath=/etc/pki/rpm-gpg/
contenturl=mirrorlist=https://ostree.fedoraproject.org/iot/mirrorlist
you could just run a
rpm-ostree refresh-md
to just check if the metadata can be refreshed. That way it is just checking the currently used mirror. Might add a few seconds tho.
I've tried that, to no luck (or discernable change).
[root@k4 wanted.d]# rpm-ostree refresh-md
Enabled rpm-md repositories: fedora-cisco-openh264 updates fedora
Updating metadata for 'updates'... done
Importing rpm-md... done
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2023-03-14T10:56:46Z solvables: 4
rpm-md repo 'updates'; generated: 2023-09-20T01:22:56Z solvables: 20815
rpm-md repo 'fedora' (cached); generated: 2023-04-13T20:37:10Z solvables: 69222
[root@k4 wanted.d]# ls /etc/ostree/remotes.d/fedora-iot.conf
/etc/ostree/remotes.d/fedora-iot.conf
[root@k4 wanted.d]# cat /etc/ostree/remotes.d/fedora-iot.conf
[remote "fedora-iot"]
url=https://ostree.fedoraproject.org/iot/
gpg-verify=true
gpgkeypath=/etc/pki/rpm-gpg/
contenturl=mirrorlist=https://ostree.fedoraproject.org/iot/mirrorlist
I'm a bit confused by what the url
value in the remote entry is actually doing. If it's returning a 403 but updates are still working, why are we attempting to query a forbidden URL? Is there any harm in changing the url to test the reachability of the mirror list?
Edit - I traced the source of the failure to the trailing slash in the url (see this comment); this can be bypassed by dropping that slash.
That
01_update_platforms_check.sh
script fails on a fresh Fedora IoT 38 install with:grep: /etc/ostree/remotes.d/*: No such file or directory No update platforms found, this can be a mistake
There are no files in
/etc/ostree/remotes.d/
. Can this script be re-written to useostree remote list
andostree remote show-url <name>
instead? The output of that for me is:$ ostree remote list fedora-iot $ ostree remote show-url fedora-iot https://ostree.fedoraproject.org/iot
FWIW my experimentation indicated that /etc/ostree/remotes.d/fedora-iot.conf
is responsible for setting the output of ostree remote show-url fedora-iot
:
[root@k4 ~]# ostree remote show-url fedora-iot
https://ostree.fedoraproject.org/iot/
[root@k4 ~]# vi /etc/ostree/remotes.d/fedora-iot.conf # add trailing slash
[root@k4 ~]# ostree remote show-url fedora-iot
https://ostree.fedoraproject.org/iot
[root@k4 ~]# mv /etc/ostree/remotes.d/fedora-iot.conf /root
[root@k4 ~]# ostree remote show-url fedora-iot
error: Remote "fedora-iot" not found
Yeah that's odd. I did a recursive grep over the entire file system and could not find any file with the content https://ostree.fedoraproject.org/iot
, yet ostree remote show-url fedora-iot
shows exactly that on my machine. I'm guessing this remote is somehow backed into ostree with f38? Maybe only on aarch64 using that raw variant from https://fedoraproject.org/iot/download/?
Anyway, I've opened #116 to use ostree remote list
to fetch the list of remotes which should work on all versions no matter where they get their information from.
As @adrienthebo mentioned, using https://ostree.fedoraproject.org/iot (without the slash) works fine because the script already checks for 3xx responses