No way to have SFTP connections `load_host_keys()` via `transport_params`
Kache opened this issue · comments
Problem description
While currently system host keys are loaded: https://github.com/RaRe-Technologies/smart_open/blob/v5.2.1/smart_open/ssh.py#L91
There's currently no way to load_host_keys()
for verifying the host. It could be added via transport_params
at paramiko client instantiation. For example, something like:
ssh = _SSH[key] = paramiko.client.SSHClient()
ssh.load_system_host_keys()
if 'load_host_keys' in transport_params:
ssh.load_host_keys(transport_params['load_host_keys'])
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
Steps/code to reproduce the problem
Currently unable to verify host key using a local known_hosts
file.
Versions
Darwin-20.6.0-x86_64-i386-64bit
Python 3.7.6 (default, Nov 24 2021, 00:59:23)
[Clang 13.0.0 (clang-1300.0.29.3)]
smart_open 5.2.1
Checklist
Before you create the issue, please make sure you have:
- Described the problem clearly
- Provided a minimal reproducible example, including any required data
- Provided the version numbers of the relevant software
If you're open to accepting a PR, I'd be willing to create a PR based on the example above.
if 'load_host_keys' in transport_params:
ssh.load_host_keys(transport_params['load_host_keys'])
Can't the user do this prior to the smart_open.open call?
Yes, but notice the first line in my example -- that solution involves accessing smart_open
's private cache ,_SSH
, to "grab" the instance of the paramiko client, which is not a good software engineering practice.
smart_open
does not normally (and rightfully, IMO) expose the underlying paramiko client to the user
I wonder if there's a better way to do this without having smart_open know all these paramiko details. I don't want to handle more transport parameters than absolutely necessary.
How about:
def ssh_client_init(): # user's code
client = paramiko.client.SSHClient()
# additional ssh config goes here
return client
transport_params = {'ssh_client_init': ssh_client_init}
with smart_open(url, 'rb', transport_params=transport_params) as fin:
...
I think it's better to pass a callable instead of the client itself because we can use the callable to create a new client whenever we get disconnected.
If there is no callable passed, then we can use the default client settings, e.g. what is currently being done.
Yet another way is to expose the underlying client. I'm not opposed to that idea, either. Hiding implementation details is a good thing in general, but here it's getting in the way of the user achieving what they want, so it isn't something we have to strictly stick to.