eclipse / paho.golang

Go libraries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autopaho set DefaultPinger on ClientConfig can't reconnect to server after disconnection

minyukim opened this issue · comments

Describe the bug
Autopaho set DefaultPinger on ClientConfig can't reconnect to the server after disconnection
Found this bug while using DefalutPinger with logger to test Pinger with autopaho (I guess this is only usecase for now)

To reproduce

  1. Set PingHandler to DefaultPinger on ClientConfig and autopaho.NewConnection
pinger := paho.NewDefaultPinger()
cfg := autopaho.ClientConfig{
    ...
    ClientConfig: paho.ClientConfig{
        ....
        PingHandler: pinger,
        ....
    }
    ...
}
cm, err := autopaho.NewConnection(ctx, cfg)
  1. Disconnect client network and wait for client being closed
  2. Reconnect network -> But autopaho can't reconnect to the server

Debug output

2024/01/13 10:00:54 connecting
2024/01/13 10:00:54 sending CONNECT
2024/01/13 10:00:54 waiting for CONNACK/AUTH
2024/01/13 10:00:55 received CONNACK
mqtt connection up
client error: ping handler error: Run() already called
2024/01/13 10:00:55 received CONNACK, starting PingHandler
2024/01/13 10:00:55 starting publish packets loop
2024/01/13 10:00:55 starting incoming
2024/01/13 10:00:55 subscribing to …
2024/01/13 10:00:55 PacketSent() returning due to done channel
2024/01/13 10:00:55 waiting for SUBACK
2024/01/13 10:00:55 returning from ping handler worker
2024/01/13 10:00:55 error called: ping handler error: Run() already called
2024/01/13 10:00:55 client stopped
2024/01/13 10:00:55 stop() called with error: <nil>
2024/01/13 10:00:55 ping stopped
2024/01/13 10:00:55 conn closed
2024/01/13 10:00:55 acks tracker reset
2024/01/13 10:00:55 session updated, waiting on workers
2024/01/13 10:00:55 client stopping, incoming stopping
2024/01/13 10:00:55 returning from incoming worker
2024/01/13 10:00:55 queue got connection
2024/01/13 10:00:55 returning from publish packets loop worker
2024/01/13 10:00:55 workers done
2024/01/13 10:00:55 handleError received error: ping handler error: Run() already called
2024/01/13 10:00:55 handleError passed error on: ping handler error: Run() already called

Expected behaviour
autopaho should be able to reconnect to the server even previous client was closed.

Software used:

Additional context
It happens because the values of Pinger in paho.ClientConfig is changed and autopaho uses this changed values to reconnect to the server

func (p *DefaultPinger) Run(conn net.Conn, keepAlive uint16) error {
	if keepAlive == 0 {
		p.debug.Println("Run() returning immediately due to keepAlive == 0")
		return nil
	}
	if conn == nil {
		return fmt.Errorf("conn is nil")
	}
	p.mu.Lock()
	if p.timer != nil {
		p.mu.Unlock()
		return fmt.Errorf("Run() already called")
	}
	select {
	case <-p.done:
		p.mu.Unlock()
		return fmt.Errorf("Run() called after stop()")
	default:
	}
	p.keepAlive = keepAlive
	p.conn = conn
	p.timer = time.AfterFunc(0, p.sendPingreq) // Immediately send first pingreq
	p.mu.Unlock()

	return <-p.errChan
}

timer is set while Run()
so the timer of the Pinger in paho.ClientConfig is not nil after that, and it causes return fmt.Errorf("Run() already called") while trying reconnection
It's not only problem of timer, but done(+ other channels I guess?) and stopOnce should also be initialized in order to work correctly

This is not a critical bug because users don't need logger for production, but it would be good to fix I guess.

I thought something like reset() at the end of Run() so that the value of Pinger can be reset after the Pinger stops like this,

func (p *DefaultPinger) Run(conn net.Conn, keepAlive uint16) error {
...
	err := <-p.errChan
	p.reset()
	return err
}

But stopped Pinger with error causes client stops,

if err := c.config.PingHandler.Run(c.config.Conn, keepalive); err != nil {
	go c.error(fmt.Errorf("ping handler error: %w", err))
}

And it causes Pinger stops again, so some values become dirty again

func (c *Client) close() {
	...
	c.config.PingHandler.Stop()
	...
}

This is not a critical bug because users don't need logger for production, but it would be good to fix I guess.

Thanks for reporting this so quickly; great bug report, but I'm not sure that I agree with the above because I believe the error will prevent the autopaho from reconnecting (ping handler returning error leads to call to c.error which should drop connection). I'll take a look later today (GMT+13) if @vishnureddy17 does not do so before that (adding Reset seems sensible).

Opened a PR with reset() approach. That's the simplest way I can think of.
Please let me know your opinions about it.

Thanks for the PR. I'm going to have a quick look at an alternative approach; being able to reuse the pinger and having a stop() function complicates things a bit given that the pinger needs to be reusable. I'm just about to have a scan through the code (wondered if it might be possible to simplify things a bit).

the error will prevent the autopaho from reconnecting (ping handler returning error leads to call to c.error which should drop connection)

Clarifying this; the issue will prevent autopaho from reconnecting IF a pinger is set in ClientConfig (otherwise the pinger will be created afresh each time so everything works). This means we have a couple of options; allow Run() to be called multiple times OR reset the pinger prior to each reconnection. I believe the first approach is simplest (otherwise autopaho will need a different pinger interface).

I started looking at using Context to handle this, and it was one of those changes that just seemed to expand. PR (#229) addresses this issue but also makes a range of other fairly major changes (I believe it makes it easier to follow the termination process).

@vishnureddy17 apologies; my PR significantly changes the implementation of Pinger you recently submitted. I'm not committed to making this change but I do feel it's considerably simpler (assuming I have not missed anything). Would really appreciate your thoughts.

The PR also makes it clear that Client is a single use object. There was no way to reuse Client after the config was made private (previously you could, potentially, change the connection in ClientConfig and call Connect again, but I'm unsure if that would have actually worked). This change makes it explicit that Connect can only be called once. Happy to change this if anyone disagrees!

@MattBrittan Just reviewed your PR, and like that approach.
I'm going to close my PR right after yours is merged.