No way to have SFTP connections `load_host_keys()` via `transport_params`

Question

No way to have SFTP connections `load_host_keys()` via `transport_params`

Kache opened this issue 2 years ago · comments

Problem description

While currently system host keys are loaded: https://github.com/RaRe-Technologies/smart_open/blob/v5.2.1/smart_open/ssh.py#L91

There's currently no way to load_host_keys() for verifying the host. It could be added via transport_params at paramiko client instantiation. For example, something like:

        ssh = _SSH[key] = paramiko.client.SSHClient()
        ssh.load_system_host_keys()
        if 'load_host_keys' in transport_params:
            ssh.load_host_keys(transport_params['load_host_keys'])
        ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())

Steps/code to reproduce the problem

Currently unable to verify host key using a local known_hosts file.

Versions

Darwin-20.6.0-x86_64-i386-64bit
Python 3.7.6 (default, Nov 24 2021, 00:59:23)
[Clang 13.0.0 (clang-1300.0.29.3)]
smart_open 5.2.1

Checklist

Before you create the issue, please make sure you have:

Described the problem clearly
Provided a minimal reproducible example, including any required data
Provided the version numbers of the relevant software

Kevin C · Answer 1 · Wed Aug 17 2022 07:53:09 GMT+0800 (China Standard Time)

If you're open to accepting a PR, I'd be willing to create a PR based on the example above.

Michael Penkov · Answer 2 · Sun Aug 21 2022 20:55:16 GMT+0800 (China Standard Time)

if 'load_host_keys' in transport_params:
    ssh.load_host_keys(transport_params['load_host_keys'])

Can't the user do this prior to the smart_open.open call?

Kevin C · Answer 3 · Tue Aug 23 2022 04:36:27 GMT+0800 (China Standard Time)

Yes, but notice the first line in my example -- that solution involves accessing smart_open's private cache ,_SSH, to "grab" the instance of the paramiko client, which is not a good software engineering practice.

smart_open does not normally (and rightfully, IMO) expose the underlying paramiko client to the user

Michael Penkov · Answer 4 · Tue Aug 23 2022 10:46:59 GMT+0800 (China Standard Time)

I wonder if there's a better way to do this without having smart_open know all these paramiko details. I don't want to handle more transport parameters than absolutely necessary.

How about:

def ssh_client_init():  # user's code
    client = paramiko.client.SSHClient()
    # additional ssh config goes here
    return client

transport_params = {'ssh_client_init': ssh_client_init}
with smart_open(url, 'rb', transport_params=transport_params) as fin:
    ...

I think it's better to pass a callable instead of the client itself because we can use the callable to create a new client whenever we get disconnected.

If there is no callable passed, then we can use the default client settings, e.g. what is currently being done.

Yet another way is to expose the underlying client. I'm not opposed to that idea, either. Hiding implementation details is a good thing in general, but here it's getting in the way of the user achieving what they want, so it isn't something we have to strictly stick to.