nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32

Home Page:https://nodemcu.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Something is wrong with MQTT

chathurangawijetunge opened this issue · comments

NodeMCU 3.0.0.0 built on nodemcu-build.com provided by frightanic.com
branch: dev
commit: 0fb2a12
release:
release DTS: 202012252235
SSL: false
build type: integer
LFS: 0x40000 bytes total capacity
modules: file,gpio,mqtt,net,node,rtctime,sjson,sntp,tmr,uart,wifi
build 2020-12-27 02:08 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)

with even the stranded example MQTT code, it act as the connection is okay, but do not receive subscribe messages, publish do also work as if it is normal but publish masses want reach the broker, no offline message is triggered.
this happens after running for long time (over 12 hours)

URL="broker"

m = mqtt.Client(node.chipid(),30,"user","pwd")
m:lwt("test/lwt","offline",0,1)   
m:on("connect", function(client) print ("connected") end)
m:on("connfail", function(client, reason) print ("connection failed", reason) end)
m:on("offline", function(client) print ("offline") start_mqtt() end)

m:on("message", function(client, topic, data)
  print(topic .. ":" )
  if data ~= nil then
    print(data)
  end
end)

m:on("overflow", function(client, topic, data)
  print(topic .. " partial overflowed message: " .. data )
end)

function start_mqtt()
 tmr.create():alarm(3000,0, function()
   m:connect(URL, 1883, false, function(client)
     print("connected")
     client:publish("test/lwt","online",0,1)
     client:subscribe("test/lwt",0,nil) 
     client:subscribe("test",0,nil)    
   end,
   function(client, reason)
     print("failed reason: " .. reason)
     start_mqtt()
   end)
 end)
end

start_mqtt()

this simple code connect to the broker if broker get disconnect it will reconnect but with wifi.sta.disconnect() and after wifi.sta.connect() it shows connect but it does not

Many things are wrong with MQTT (#2987, #3068, doubtless many more). My https://github.com/nwf/nodemcu-firmware/tree/dev-active branch has some fixes and refactorings that may help, but many, many things remain wrong with MQTT even after all that work and it's just been too depressing to even contemplate fixing and nobody seems really bothered by it.

Please attempt packet capture and investigate what's going on at the network level, together with transcripts from your demo program and other MQTT clients of the broker. That is, it would be most helpful to have narrative logs, with packet traces and debug information of the form "NodeMCU Device Under Test (DUT) connects and sends X Y Z to broker; broker establishes subscriptions and sends A B C to DUT; a client publishes M to Q and the broker forwards that to DUT, which acknowledges; 11 hours pass with no network traffic beyond MQTT PING and PONG between broker and DUT; a client publishes N to Q; the broker sends this to DUT, which fails to acknowledge and reports internally [...]". I'm aware that this is a huge amount of work, but someone's going to have to do it, and so far nobody, including me, has really been champing at the bit.

(ETA: Making things even more depressing... Even if we get MQTT right, it's likely that there are nigh unsolvable issues below, given, for example, #3040. It's not clear that there's a better solution at present than to give up and admit that NodeMCU is not a high-reliability platform except in very constrained circumstances; in general, your application and remote endpoints should conspire to actively keep and feed watchdog timers that cause reboots rather than trying to fix anything without.)

OT but we gotta discuss this somewhere...

@nwf what is the best way out of this misery? The "upstream" https://github.com/tuanpmt/esp_mqtt has been unmaintained since 2017. Hence, we can't turn to it for fixes to port. Options:

My https://github.com/nwf/nodemcu-firmware/tree/dev-active branch has some fixes and refactorings that may help

Can we at least merge those?

This has tests opposed to tuanpmt/ESP8266MQTTClient, which might lead to better quality.

Can we at least merge those?

Sounds reasonable

i think i have found a small workaround. by adding a timer for connection it solves my issue for the time being.

URL="broker"

m = mqtt.Client(node.chipid(),30,"user","pwd")
m:lwt("test/lwt","offline",0,1)   
--m:on("connect", function(client) print ("connected") end)
m:on("offline" ,start_mqtt) 
m:on("connfail",start_mqtt)

m:on("message", function(client, topic, data)
  print(topic .. ":" )
  if data ~= nil then
    print(data)
  end
end)

--m:on("overflow", function(client, topic, data)
--  print(topic .. " partial overflowed message: " .. data )
--end)

Mqtt_Conn_tmr=tmr.create()

function start_mqtt()
 Mqtt_Conn_tmr:alarm(3000,0, function()
   m:connect(URL, 1883, false, function(client)
     print("connected")
     client:publish("test/lwt","online",0,1)
     client:subscribe("test/lwt",0,nil) 
     client:subscribe("test",0,nil)    
   end,
   function(client, reason)
     print("failed reason: " .. reason)
     start_mqtt()
   end)
 end)
end

start_mqtt()
commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.