Host-Keys signed by a CA aren't recognized as known hosts
cerlestes opened this issue · comments
Abstract
It seems like paramiko isn't correctly processing keys from known_hosts
that are annotated with @cert-authority
. As I'm unfamiliar with python and paramiko, I cannot try to reproduce it. The setup works fine using the OpenSSH tools, but fails when using paramiko.
Environment
I've encountered this bug in duplicity, a backup program that uses paramiko as an SSH agent.
Client has a /etc/ssh/ssh_known_hosts
file that contains a @cert-authority
line with the public key of the signing CA:
@cert-authority *.mydomain ssh-rsa AAAAB... hostkey_ca
Server has *-cert.pub
files for all of its host keys, e.g. /etc/ssh/ssh_host_rsa_key-cert.pub
, which are configured in its sshd_config
like so: HostCertificate /etc/ssh/ssh_host_rsa_key-cert.pub
. These certificates were created using the private key of the aforementioned CA.
Situation
Connecting via OpenSSH's program suite works fine:
user@client:~# sftp backups.mydomain
Connected to backups.mydomain.
sftp> exit
Connecting via paramiko (through duplicity) does not:
user@client:~# duplicity Backups sftp://backups.mydomain/Backups
The authenticity of host '[backups.mydomain]:1234' can't be established.
SSH-RSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
Are you sure you want to continue connecting (yes/no)? no
The fingerprint displayed is of the signed public key.
Paramiko only manages to connect after adding the domain/ip and public host key of the server to the known_hosts
file in the client, when it should not need this due to the given CA key and certificate.
Thanks for the report! I don't have personal experience using a CA system with SSH keys so not sure offhand how well Paramiko supports it; could be a bug, could be outright lack of support.
I'm +1 on getting it working but would need someone both with CA experience & comfort w/ the codebase to at least get us pointed in the right direction.
+1 on this. This is very useful and client certificates seem to already work, but host certificates have a slightly different syntax:
https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.certkeys
Tying this to #531 even though the two probably don't technically overlap (but they kinda do conceptually).
I am very interested in supporting host keys with cert-authority. Has there been any progress on this? If not I will attempt an implementation.
Still not implemented as near as I can tell from the documentation. Using load_system_host_keys on a file containing entries that start with @cert-authority
fails to load these lines. As host certificates are fundamental to my SSH-at-scale use I can't use Paramiko unless I ignore missing host keys or figure out this implementation.
@kazzmir did you ever attempt to implement?
No Implementation until 2021?
Hi! I just countered this issue upon trying to use Paramiko. Is there any update on it?
Flagging this to include/consider in the key/auth rewrite of #387.
That rewrite is waiting for bitprophet's drop of Python 2 and older 3.x from all of his packages. Not sure how long that will take; the key work will only happen after that modernization is done.
Just ran into this, also, and poked through hostkeys.py
to see where the issue is.
# hostkeys.py
# 339 - 347
fields = re.split(" |\t", line)
if len(fields) < 3:
# Bad number of fields
msg = "Not enough fields found in known_hosts in line {} ({!r})"
log.info(msg.format(lineno, line))
return None
fields = fields[:3] # <--- right there!
names, key_type, key = fields
A signed host key's 0th field is @cert-authority
, leading to a classic off-by-one error on the next two fields. key
becomes the key type (eg. ssh-rsa
), which is not a valid base64 string:
In [19]: print(fields[2])
ssh-rsa
In [20]: base64.decodebytes(bytes(fields[2], "utf8"))
---------------------------------------------------------------------------
Error Traceback (most recent call last)
<ipython-input-20-be56aa59c4a7> in <module>
----> 1 base64.decodebytes(bytes(fields[2], "utf8"))
/usr/lib64/python3.10/base64.py in decodebytes(s)
560 """Decode a bytestring of base-64 data into a bytes object."""
561 _input_type_check(s)
--> 562 return binascii.a2b_base64(s)
563
564
Error: Incorrect padding
A naive solution might be to throw some more re
methods at it. Maybe re.match
with a pattern that gets just the key, like r'\S+=='
--which would technically work as long as the key comment doesn't have ==
in it anywhere 😝 -- or do more but simpler matches on each field after the line is split to find all the fields that can't be the key. Kinda just spitballin' here.
A related ergonomics issue: in my case, this signed key happens to be the first one in known_hosts
and it's not even the key for the host I'm trying to connect to. That the HostKeyEntry.from_line
method crashes out the entire program in this case is pretty annoying--I have probably 2,000 keys in my known_hosts
file and none of them were checked. My recommendation would be to catch the exception in HostKeys.load
, spit out a warning to a log and stderr
, and carry on.
Thanks very much for the investigation and detail on this, @MajorDallas!
If I read your comment correctly, and to summarize, you have a twofold recommendation?
- Improve line matching to account for this different syntax for host-signed keys.
- Improve the host line reading code to gracefully warn and skip on line-parse failures.
Seems to me that these ought to be two different PRs.
Separately:
- Is there a specification for this syntax that you could link to?
- Could you provide an example of a
known_hosts
line with and without host-signing syntax, for reference?
A twofold recommendation, yes. However, for the first recommendation, rather than
Improve line matching to account for this different syntax for host-signed keys
I would say:
Improve line matching to correctly parse the
known_hosts
file format.
I know that's a quibble, but this issue will affect more than signed host keys: any host key that has been revoked, somehow specifies a host pattern* with a space in it (not sure that's even allowed, tbh), or comments will all break under the current method of splitting on space and indexing the resultant list without checking what's in the 0th field. I haven't checked, but if the same or a similar algorithm is also used for parsing the authorized_keys
file then there are even more optional fields to trip it up--but that's something to discuss in #387.
- Is there a specification for this syntax that you could link to?
- Could you provide an example of a known_hosts line with and without host-signing syntax, for reference?
The only spec I could find is right out of man 8 sshd
, under SSH_KNOWN_HOSTS FILE FORMAT
:
Each line in these files contains the following fields: marker (optional), hostnames, keytype,
base64-encoded key, comment. The fields are separated by spaces.
The marker is optional, but if it is present then it must be one of “@cert-authority”, to indicate
that the line contains a certification authority (CA) key, or “@revoked”, to indicate that the key
contained on the line is revoked and must not ever be accepted. Only one marker should be
used on a key line.
...
An example ssh_known_hosts file:
# Comments allowed at start of line
closenet,...,192.0.2.53 1024 37 159...93 closenet.example.net
cvs.example.net,192.0.2.10 ssh-rsa AAAA1234.....=
# A hashed hostname
|1|JfKTdBh7rNbXkVAQCRp4OQoPfmI=|USECr3SWf1JUPsms5AqfD5QfxkM= ssh-rsa AAAA1234.....=
# A revoked key
@revoked * ssh-rsa AAAAB5W...
# A CA key, accepted for any host in *.mydomain.com or *.mydomain.org
@cert-authority *.mydomain.org,*.mydomain.com ssh-rsa AAAAB5W...
Somewhat interesting (to me): as far as I can tell, libssh doesn't check for this field, either (I am not C-literate, though, and could have just missed it). I think the best possible reference is OpenSSH's own code. It looks like their line parsing logic starts in hostfile.c:760, with the function check_markers called on line 803 after ignoring leading whitespace and skipping comments, but before doing anything else at all.
* On the topic of host patterns: I didn't read real far into how the host field is handled. I assume it's parsed at least enough to recognize a non-standard port (the syntax requires the pattern be enclosed in square brackets), but what about eg. wildcards? OpenSSH specifies support for *
and ?
globs plus !
for negation in man 8 sshd
. The code that parses those bits starts in match_hostname and involves several other functions.
(edit: use code block instead of quote block, since apparently "revoked" is a github user and I accidentally pinged them by quoting the man page 😅 )
I threw this together in about 20 minutes and tested with a cert-authority with comments example and a marker-free, comment-free example:
from enum import Enum
from typing import Optional
class Markers(str, Enum):
CERT_AUTHORITY = "@cert-authority"
REVOKED = "@revoked"
class HostKeyParser:
marker: Optional[Markers]
hosts: str
keytype: str
key: str
comment: str
_original_line: str
__slots__ = (
"marker",
"hosts",
"keytype",
"key",
"comment",
"_original_line",
)
def __init__(self, hk_line: str):
(
self.marker,
self.hosts,
self.keytype,
self.key,
self.comment,
) = self.parse_line(hk_line)
self._original_line = hk_line
@staticmethod
def parse_line(hk_line: str) -> tuple[Optional[Markers], str, str, str, str]:
fields = hk_line.split(" ")
try:
marker = Markers(fields[0])
hosts, keytype, key, *comment = fields[1:]
except ValueError:
marker = None
hosts, keytype, key, *comment = fields
return marker, hosts, keytype, key, " ".join(comment)
Fundamentally it's still just splitting on space, but I think my concerns about spurious spaces in fields besides comment
were unfounded. Maybe this class set-up is overkill, too--I mean, the actual parsing logic is a static method that could work just as well being a plain function. The only trade-off might be an object where each field can be referenced by name, but a function can always return a NamedTuple
or a dataclass
that's even simpler than this one 🤷
It works, though!
Edit:
Yeah, I think I like it better this way:
@dataclass(frozen=True, slots=True)
class HostKey:
marker: Optional[Markers]
hosts: str
keytype: str
key: str
comment: str
_original_line: str
def parse_line(hk_line: str) -> HostKey:
fields = hk_line.split(" ")
try:
marker = Markers(fields[0])
hosts, keytype, key, *comment = fields[1:]
except ValueError:
marker = None
hosts, keytype, key, *comment = fields
return HostKey(marker, hosts, keytype, key, " ".join(comment), hk_line)
Agreed, @MajorDallas, the simplicity of the bare function is appealing.
This implementation of the parsing logic makes sense to me and looks good -- it has a number of nice features.
Looking at what's in hostkeys.py
, though, this is rather different than what's there.
Unless there's a reason to re-implement the container class, probably best to fix this with as minimal modification as possible to the existing code? This logic seems straightforward enough that adapting it shouldn't be too difficult?
You're right. I admit I didn't refer to the existing code, I just wanted to prototype the logic. I'd need to spend some time to figure out how to integrate it, but someone already familiar with that module could certainly do it faster.
One thing my prototype doesn't consider is host patterns. I kind of just assumed that there's logic elsewhere that implements the glob and negation operators and that the parser doesn't need to care about that.
Nod, no worries - a POC like this should make it easier to implement: makes it a port, rather than a from-scratch task.
Feel free to take a stab yourself, or leave it here for someone else to pick up. Your call!
I just noticed something from the sample hostkeys above:
closenet,...,192.0.2.53 1024 37 159...93 closenet.example.net
The second field, the key type, here is 1024
rather than eg. ssh-rsa
. I figured each key type has some int
associated with it, and sure enough the OpenSSH code checks against an enum called sshkey_types
. Now, I'm not a C programmer... But this enum only has 14 members and they don't seem to be using any non-default values for the enum members, meaning the last member's int
value should be 13 if I'm correctly understanding how C enums work. None of the sshkey_impl
structs have a field with 1024 as a value, either, so I don't know what key type that might be. Furthermore, afaict they use the enum to help with looking up structs which have a name
field that they do a memcmp
against--basically a string match, not an integer match, I think.
Field 3 is 37
. Pretty sure that's not a key. The ellipsis in field 4 suggests that's the key, but according to the spec fields >= 4 must be a comment. (>= 5 if there's a marker in field 1)
This very much looks like an invalid hostkey entry to me. I would not expect the man-page authors to put in an invalid example without somehow indicating that it's invalid, though 🤔 . I've never seen a hostkey entry like this in the wild, but it's not like I spend a lot of time reading random known_hosts
files. Probably okay to ignore it?
Well, I took a stab at it 'cause why not 🤷
I wasn't sure whether to add an attribute to HostKeyEntry
for the marker, if such an attribute would need to be consumed somewhere or how, so I left that off and just implemented the parser logic inside HostKeyEntry.from_line
. Based on the return of None
if the key type is unknown, it returns None
if the marker is @revoked
, but that can be adjusted.
I added sample cert-authority and revoked keys to the sample data in test_hostkeys.py
, as well, adjusted a couple of assert
s accordingly, and have passing tests 😃
Just need to discuss what, if anything, to do with the markers once we have them.
Cool - thanks!
I'd say rebase this onto a branch in your fork and create a PR--further discussion of the details of the handling for each marker case and any changes to the data structures would happen there. I'd replicate what you put in this most recent comment, as the starting point for conversation.
(The rebase is important because it's bad practice to develop on main
and create a PR from there, because then it becomes very messy if you need to update your fork's main
to match the main
branch of paramiko/paramiko
. Apologies if I'm telling you something you already know, here.)
An oversight on my part 😅 I'll get a branch and PR set-up.
+1