rubencaro / sshex

Simple SSH helpers for Elixir. SSH is useful, but we all love SSHEx !

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Streams are not clearing messages correctly in between calls

jfayad opened this issue · comments

Given we run an SSH command using Streams (i.e ping -c 5 www.google.com)
AND the command ended correctly with a {:status,status}
WHEN we try to run the command again inside a stream
THEN we are getting random responses...

After Inspecting the response (

response = receive do
) I spotted the following pattern:

First call, I get data then :
{:exit_status, 0, 0}

second call I get nothing with:
{:eof, 0}
{:closed, 0}

third call I get some data ending with:
{:exit_status, 0, 141}

fourth call, I get nothing with:
{:closed, 0}

then it will loop over 3rd and 4th alternatively

If I flush() in between calls then everything works normally.

Theory is that some leftover messages coming from ssh_cm are being left in the message box and they are interfering with subsequent calls.

I'm not familiar enough with these to actually create a pull request so now I'll workaround this issue by running each call inside a Task. But a fix to avoid the need of such a workaround would be great.

Hi @jfayad,

I will find time to take a look at this, but the only way that a stream would halt, leaving hanging messages, is after emitting an '{:error,reason}'. So I guess you could flush while processing that case.

I will take a look at it anyway to see if there is anything broken.

Cheers

Hi @rubencaro

I actually am receiving {:status, status} on the first call, then the misbehaviour occurs.

Also using flush might not be safe if other processes are sending messages to the current process, is it ? I guess the specific messages from ssh should be cleared in the after_fun, that's just an intuition and I'm not really sure how to do that though (selectively clearing messages from the mailbox) otherwise I would have gladly sent a pull request.

Cheers

Here you go:

  defp ssh_connect() do
    ip = Application.fetch_env!(:ew_aegir_bridge, EwAegirBridge.Aegir) |> Keyword.get(:aegir_server_ip)
    username = Application.fetch_env!(:ew_aegir_bridge, EwAegirBridge.Aegir) |> Keyword.get(:aegir_server_user)
    case Mix.env do
      :dev ->
        SSHEx.connect ip: to_charlist(ip), 
      user: to_charlist(username), 
      user_dir: Application.fetch_env!(:ew_aegir_bridge, EwAegirBridge.Aegir) |> Keyword.get(:user_dir)
      _ ->
        SSHEx.connect ip: to_charlist(ip), 
      user: to_charlist(username)
    end
  end
  defp aegir_commad(commands) do
    
    # open ssh connection to terminal
    case ssh_connect() do
      {:ok, connection} -> 
        ping = "ping -c 2 www.google.com; ping -c 3 www.google.fr;"
        IO.puts ping
        str = SSHEx.stream connection,'#{ping}'
        IO.puts "Stream returned from SSHex"
        str
        |> Enum.reduce_while({}, fn(x, acc)->
          IO.inspect x
              case x do
                {:stdout,row}    -> 
                  {:cont, {}}
                {:stderr,row}    -> 
                  {:cont, {}}
                {:status,status} -> 
                  {:halt,{:ok, status}}
                {:error,reason}  -> 
                  {:halt,{:error, reason}}
              end
            end)
      {:error, reason} ->
        {:error, reason}
    end 
  end

Hi @jfayad ,

You are right. flush would be dangerous. Don't do it unless you know what you are doing.

I will try to walk you through the process so you can understand better how it works, and maybe we both get to the solution. Just as thinking out loud.

SSHEx calls :ssh.exec (see here) internally. That ensures that there is a complete sequence of messages sent to your inbox. Also in order. The Erlang FAQ section 10.8 (see here) states that it is guaranteed that the order of reception will be preserved.

In here that order is still preserved, as anything coming from :ssh_cm is received (see also this). Leaving only the case of a timeout, which should be treated by you accordingly. Not this case anyway.

Everything received from :ssh.exec is parsed here and then mapped here to be served each time the stream requests an item. Non data messages (like :eof) are discarded from the stream in here, so you will not get them, and the close message just halts the stream here. You will not see it either.

This snippet works as expected:

defmodule A do

  def go(cmd) do
    {:ok, conn} = SSHEx.connect ip: "127.0.0.1"
    str = SSHEx.stream conn, cmd
    Enum.map(str, fn(x)->
      case x do
        {:stdout, row}    -> {:normal_output, row}
        {:stderr, row}    -> {:error_output, row}
        {:status, status} -> {:exit_code, status}
        {:error, reason}  -> {:error, reason}
      end
    end)
  end

end

Then A.go "ping -c 3 8.8.8.8" and A.go "ping -c 3 8.8.8.8.8.8" if you want (to make it fail), then you should see every message you get for the complete sequence both times. Nothing left on your inbox. You can flush to see there is nothing left.

iex(5)> A.go "ping -c 3 8.8.8.8"
[normal_output: "PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.\n64 bytes from 8.8.8.8: icmp_seq=1 ttl=60 time=19.2 ms\n",
 normal_output: "64 bytes from 8.8.8.8: icmp_seq=3 ttl=60 time=16.3 ms\n",
 normal_output: "\n",
 normal_output: "--- 8.8.8.8 ping statistics ---\n3 packets transmitted, 2 received, 33% packet loss, time 2055ms\nrtt min/avg/max/mdev = 16.381/17.827/19.274/1.452 ms\n",
 exit_code: 0]
iex(6)> A.go "ping -c 3 8.8.8.8"
[normal_output: "PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.\n64 bytes from 8.8.8.8: icmp_seq=1 ttl=60 time=17.1 ms\n",
 normal_output: "64 bytes from 8.8.8.8: icmp_seq=2 ttl=60 time=16.0 ms\n",
 normal_output: "64 bytes from 8.8.8.8: icmp_seq=3 ttl=60 time=16.2 ms\n\n",
 normal_output: "--- 8.8.8.8 ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 2002ms\nrtt min/avg/max/mdev = 16.088/16.490/17.142/0.487 ms\n",
 exit_code: 0]

BUT this snippet reproduces what you saw:

defmodule A do

  def go(cmd) do
    {:ok, conn} = SSHEx.connect ip: "127.0.0.1"
    str = SSHEx.stream conn, cmd
    Enum.reduce_while(str, [], fn(x, acc)->
      case x do
        {:stdout, row}    -> {:cont, acc ++ [row]}
        {:stderr, row}    -> {:cont, acc ++ [row]}
        {:status, status} -> {:halt, acc ++ ["stopped"]}
        {:error, reason}  -> {:halt, acc ++ [reason]}
      end
    end)
  end

end

You can A.go "ping -c 3 8.8.8.8" and get everything ok, so all the messages were processed, but when you call it again you get an empty response. Like this:

iex(6)> A.go "ping -c 3 8.8.8.8"
["PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.\n64 bytes from 8.8.8.8: icmp_seq=1 ttl=60 time=16.4 ms\n",
 "64 bytes from 8.8.8.8: icmp_seq=2 ttl=60 time=18.7 ms\n",
 "64 bytes from 8.8.8.8: icmp_seq=3 ttl=60 time=16.0 ms\n\n",
 "--- 8.8.8.8 ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 2003ms\nrtt min/avg/max/mdev = 16.095/17.096/18.762/1.185 ms\n",
 "stopped"]
iex(7)> A.go "ping -c 3 8.8.8.8"
[]
iex(8)> flush
{:ssh_cm, #PID<0.199.0>, {:exit_signal, 0, 'PIPE', [], []}}
{:ssh_cm, #PID<0.199.0>, {:closed, 0}}
:ok

If I inspect this line which is called to populate the Stream, by reduce_while in this case, I see this:

iex(2)> A.go "ping -c 3 8.8.8.8"

{[stdout: "PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.\n64 bytes from 8.8.8.8: icmp_seq=1 ttl=60 time=16.4 ms\n"], 0}

{[stdout: "64 bytes from 8.8.8.8: icmp_seq=3 ttl=60 time=17.2 ms\n\n"], 0}

{[], 0}

["64 bytes from 8.8.8.8: icmp_seq=2 ttl=60 time=16.7 ms\n",
 "--- 8.8.8.8 ping statistics ---\n3 packets transmitted, 3 received, 0% packet loss, time 2003ms\nrtt min/avg/max/mdev = 16.435/16.823/17.299/0.358 ms\n",
 "stopped"]
iex(3)> flush
{:ssh_cm, #PID<0.181.0>, {:closed, 0}}
:ok

Meaning for some reason reduce_while stops calling next on the Stream when it receives the response {[], 0}, which according to Elixir source emits {:cont, acc}. So it should go on asking for the next element of the Stream.

I suggest you go using Enum.map, or Enum.each, and avoid using Enum.reduce for using SSHEx.stream by now. Also it would be great if you have the time to open an issue so that the Elixir team get to know that this problem exists. Put a reference to this so the have all the details.

I close this as it seems related to Enum.reduce_while internals.

Thanks.

Hey @rubencaro thanks for the detailed explanation, I got swamped by the time you've put your answer and then got busy with other projects untill now.

I'll go with creating an issue on the Elixir project and see how it goes

hey @rubencaro after further digging, it seems the reduce_while implementation that is proposed in the docs. is the culprit.

I suggest to edit the doc. to provide a command that would be fully compatible with streams or to change the way streams are returning values to the reducer so it's aligned with the doc. suggested.

More details elixir-lang/elixir#7050 (comment)

Hi @jfayad,

There's no 'reduce_while' in the docs. In the docs I propose 'each' or 'map', actually.

What it seems as I read the issue you opened on the Elixir repo is that there is a difference in the correct return value for the function passed to a 'map' and the one passed to a 'reduce'.

Our mistake was to assume both functions should return the same values throughout the process. It seems there's a small difference (a ':halt' for the 'map' functions, a ':cont' for the 'reduce' functions).

I suggest you make a pull request adding a working example of a 'reduce' process, but keeping the 'each/map' example, as it is correct.

Thanks