apache / mina-sshd

Apache MINA sshd is a comprehensive Java library for client- and server-side SSH.

Home Page:https://mina.apache.org/sshd-project/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A broken link during key exchange may cause an infinite loop.

FerrariCalifornia opened this issue · comments

commented

Version

2.9.2

Bug description

When using a custom thread to send netconf messages, the cpu is continuously occupied, which affects the code of other modules. I suspect that this happened due to the link being broken during key exchange.The debug log is as follows

Actual behavior

When using a custom thread to send netconf messages, the cpu is continuously occupied, which affects the code of other modules.

Expected behavior

Don't take up the cpu in an infinite loop

Relevant log output

2023-09-06 16:19:58.960 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.523 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.523 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.523 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.523 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.523 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.523 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.523 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   292  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Thread Thread[transferEpQueryReq-thread-8,5,main] awakens after KEX done
2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl   282  [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS


### Other information

_No response_

That looks strange. I do have a nasty suspicion... please show the code of your transferEpQueryReq thread.

What exactly do you mean by "broken link"?

Is this issue still reproducible with 2.10.0?

commented

we did not try 2.10.0

Here is the code I suspect is causing the problem
sshd-core/src/main/java/org/apache/sshd/common/session/helpers/KeyExchangeMessageHandler.java

as we can see , KexState state = session.kexState.get(); when the session is already close,but key exchage not completed
the code may hava an infinite loop

when the debug log print

2023-09-06 16:19:59.524 [transferEpQueryReq-thread-8] DEBUG org.apache.sshd.server.session.ServerSessionImpl 282 [] writeOrEnqueue(ServerSessionImpl[itran@/200.200.60.22:47442])[SSH_MSG_CHANNEL_DATA]: Blocking thread Thread[transferEpQueryReq-thread-8,5,main] until KEX is over or timeout 10000 MILLISECONDS

i have chacked the session 200.200.60.22:47442 its already closed

sshd-core/src/main/java/org/apache/sshd/common/session/helpers/KeyExchangeMessageHandler.java

/**
     * Writes an SSH packet. If no KEX is ongoing and there are no pending packets queued to be written after KEX, the
     * buffer is written directly. Otherwise, the write is enqueued or the calling thread is blocked until all pending
     * packets have been written, depending on the result of {@link #isBlockAllowed(int)}. If the calling thread holds
     * the monitor of the session's {@link AbstractSession#getFutureLock()}, it is never blocked and the write is
     * queued.
     * <p>
     * If {@code timeout <= 0} or {@code unit == null}, a time-out of "forever" is assumed. Note that a timeout applies
     * only if the calling thread is blocked.
     * </p>
     *
     * @param  cmd         SSH command from the buffer
     * @param  buffer      {@link Buffer} containing the packet to write
     * @param  timeout     number of {@link TimeUnit}s to wait at most if the calling thread is blocked
     * @param  unit        {@link TimeUnit} of {@code timeout}
     * @return             an {@link IoWriteFuture} that will be fulfilled once the packet has indeed been written.
     * @throws IOException if an error occurs
     */
    protected IoWriteFuture writeOrEnqueue(int cmd, Buffer buffer, long timeout, TimeUnit unit) throws IOException {
        boolean holdsFutureLock = Thread.holdsLock(session.getFutureLock());
        for (;;) {
            DefaultKeyExchangeFuture block = null;
            // We must decide _and_ write the packet while holding the lock. If we'd write the packet outside this
            // lock, there is no guarantee that a concurrently running KEX_INIT received from the peer doesn't change
            // the state to RUN and grabs the encodeLock before the thread executing this write operation. If this
            // happened, we might send a high-level messages after our KEX_INIT, which is not allowed by RFC 4253.
            //
            // Use the readLock here to give KEX state updates and the flushing thread priority.
            lock.readLock().lock();
            try {
                if (shutDown) {
                    throw new SshException("Write attempt on closing session: " + SshConstants.getCommandMessageName(cmd));
                }
                KexState state = session.kexState.get();
                boolean kexDone = KexState.DONE.equals(state) || KexState.KEYS.equals(state);
                if (kexDone && kexFlushed) {
                    // Not in KEX, no pending packets: out it goes.
                    return session.doWritePacket(buffer);
                } else if (!holdsFutureLock && isBlockAllowed(cmd)) {
                    // KEX done, but still flushing: block until flushing is done, if we may block.
                    //
                    // The future lock is a _very_ global lock used for synchronization in many futures, and in
                    // particular in the key exchange related futures; and it is accessible by client code. If we
                    // block a thread holding that monitor, none of the futures that use that lock can ever be
                    // fulfilled, including the future this thread would wait upon.
                    //
                    // It would seem that calling writePacket() while holding *any* (session global) Apache MINA
                    // sshd lock in client code would be extremely bad practice. But note that the deprecated
                    // ClientUserAuthServiceOld does exactly that. While that deprecated service doesn't send
                    // channel data, there might be client code that does similar things. But this is also the
                    // reason why we must be careful to never synchronize on the futureLock while holding the
                    // kexLock: if that happened while code concurrently running called writePacket() while holding
                    // the futureLock, we might get a deadlock due to lock inversion.
                    //
                    // Blocking here will prevent data-pumping application threads from overrunning the flushing
                    // thread and ensures that the flushing thread does indeed terminate.
                    //
                    // Note that we block only for channel data.
                    block = kexFlushedFuture;
                } else {
                    // Still in KEX or still flushing and we cannot block the thread. Enqueue the packet; it will
                    // get written by the flushing thread at the end of KEX. Note that theoretically threads may
                    // queue arbitrarily many packets during KEX. However, such a scenario is mostly limited to
                    // "data pumping" threads that typically will block during KEX waiting until window space is
                    // available on the channel again, which can happen only at the end of KEX.
                    // (SSH_CHANNEL_WINDOW_ADJUST is not a low-level message and will not be sent during KEX.)
                    //
                    // If so many packets are queued that flushing them triggers another KEX flushing stops
                    // and will be resumed at the end of the new KEX.
                    if (kexDone && log.isDebugEnabled()) {
                        log.debug("writeOrEnqueue({})[{}]: Queuing packet while flushing", session,
                                SshConstants.getCommandMessageName(cmd));
                    }
                    return enqueuePendingPacket(cmd, buffer);
                }
            } finally {
                lock.readLock().unlock();
            }
            if (block != null) {
                if (timeout <= 0 || unit == null) {
                    if (log.isDebugEnabled()) {
                        log.debug("writeOrEnqueue({})[{}]: Blocking thread {} until KEX is over", session,
                                SshConstants.getCommandMessageName(cmd), Thread.currentThread());
                    }
                    block.await();
                } else {
                    if (log.isDebugEnabled()) {
                        log.debug("writeOrEnqueue({})[{}]: Blocking thread {} until KEX is over or timeout {} {}", session,
                                SshConstants.getCommandMessageName(cmd), Thread.currentThread(), timeout, unit);
                    }
                    block.await(timeout, unit);
                }
                if (log.isDebugEnabled()) {
                    log.debug("writeOrEnqueue({})[{}]: Thread {} awakens after KEX done", session,
                            SshConstants.getCommandMessageName(cmd), Thread.currentThread());
                }
            }
        }
    }

Here is the cpu flame graph , thread transferEpQueryReq takes the most cpu time
20230906-151611

I do know the Apache MINA sshd code 😄. Yes, that's where the loop is. But the bug is in the flushing thread; it must handle exceptions that occur when flushing a packet better. In particular, it must then also set kexFlushed = true.

I'll push a PR soon.