Switch to desktop version  
MQTT Publish Problem - Printable Version

+- Ewon Technical Forum (https://techforum.ewon.biz)
+-- Forum: Development (https://techforum.ewon.biz/forum-50.html)
+--- Forum: Java (https://techforum.ewon.biz/forum-53.html)
+--- Thread: MQTT Publish Problem (/thread-1202.html)



MQTT Publish Problem - ntnunk - 10-03-2020

After updating to f/w 14.1s0 and updating my application to take advantage of the new GetStatus() methods of the MQTT client, I'm seeing a strange problem in every eWON that uses the new application on the latest firmware. The problem is that the MQTT client seems to randomly stop publishing messages. I can see my logic actually making the call to MqttClient.publish() and everything appears to execute as intended, no exceptions thrown, the sequence before, during, and after seems correct, but looking at the log I never see the MQTT client post the "Client XXX sending PUBLISH(..." message that the MQTT client itself logs to the Realtime Logs under the MQTT source. I see the PINGREQ/PINGRESP pairs once the message publishes stop, so the client is still connected to the broker (AWS IoT Core), but the publish never happens. I don't see the PUBLISH message and I don't see data arriving on the broker. At the same time I updated the firmware I also modified the application to publish using QoS1 so I don't know if that affects it. I might revert back to QoS0 today to see if there is any effect.

I'm still troubleshooting and investigating, so there will probably be more information coming, but I was just wondering if anyone else had seen anything similar.

Best,
Noel


RE: MQTT Publish Problem - ntnunk - 25-03-2020

I've continued tracing this issue and there is definitely a problem with the Flexy's internal MQTT client. I stepped through the code and something strange is happening. My publish code looks like this:
Code:
public boolean isConnected() {
    //return this.connected;
    int clientStatus = 0;
    try {
        clientStatus = this.getStatus();
    }
    catch(EWException e) {
    }

    if(clientStatus == 5)
        return true;
    return false;
}

public void publishMessage(MqttMessage msg) {
    try {
        if(!isConnected()) {
            System.out.println("MQTT Client is not connected!");
            return;
        }
       
        this.publish(msg, 1, false);
        this.raiseEvent(MqttClientEventType.PUBLISH);
    } catch (EWException e) {
        e.printStackTrace();
    }
}

In this code, the isConnected() method returns true, indicating that the Flexy's MqttClient.getStatus() method returned 5. The call to this.publish(msg, 1, false) is a call to the Flexy MqttClient superclass MqttClient.publish() method. That method appears to execute fine, no exceptions thrown, and my code continues to the internal raiseEvent() call as it should. But, when I look at the Realtime Logs on the flexy, I never see "MQTT Log(16) Client Thing_Client_ID sending PUBLISH (d0, q1, r0, m2764, 'topic/name', ... (XXX bytes))" log entry that is printed when the internal client actually does publish a message. Nor, since I'm publishing with a QoS = 1, do I ever see the "MQTT Log (16): Client Thing_Client_ID received PUBACK (Mid: XXX)" message indicating the client received the PUBACK back from the AWS IoT broker. The PINGREQ/PINGRESP messages show up as normal

I've spent a large amount of time watching the code while connected to eCatcher and and monitoring the $aws/events/presence/connected and $aws/events/presence/disconnected IoT topics. What I've discovered is that this problem occurs when the Flexy has a momentary disconnection from the Talk2M servers, or perhaps just a momentary disconnection from the Internet. In any event, the device will quickly disconnect and reconnect to Talk2M/eCatcher for whatever reason. When the disconnection occurs I can see the disconnect/reconnect events on the AWS IoT reserved event topics, indicating that the MQTT Broker saw the device MQTT client disconnect and then reconnect, but once that happens the MQTT client condition occurs where the client won't publish. It's almost as though the client itself knows it's connected but there's some logic in the publish() call that thinks it is not. Once this condition occurs it doesn't change until the client is restarted, either manually or through a reboot. Any ideas how to work around this would be greatly appreciated. 

Best,
Noel


RE: MQTT Publish Problem - simon - 27-03-2020

Noel,

I have already seen such a behavior (already reported to the R&D but not yet fixed) when the Flexy is publishing many messages without having the MQTT connection.
The work-around that worked fine for me was to publish only when the status indicates that it is online, as you are actually doing. So that's strange...

Can you tell me what kind of connection do you use ? 4G, Ethernet ?? What are your MQTT connection parameters ?
Does it happen often ? Can you simulate it easily by unplugging the WAN cable (if you use Ethernet) for example ? Or you do not have any "test" Ewon ?
You can perhaps try to monitor the the status ? Does it see a disconnection ? An idea could be to decrease the keep alive time, so that the MQTT disconnection is well detected .

Simon


RE: MQTT Publish Problem - ntnunk - 30-03-2020

Hi Simon,

I've seen this happen on 3 different devices. Two are connected via cellular (one is 3G on a very bad network, one is 4G on a very good network), one is connected via Ethernet through a very reliable 1Gb/S Internet conneciton. MQTT is set up to communicate with AWS IoT Core, via port 8883, with a KeepAlive setting of 30 seconds. The two devices connected via cellular show the problem most frequently, but both are remote so it's hard to do much with them except watch and try to diagnose or debug when the problem occurs. The third device, connected via Ethernet, is on my desk. I've seen it happen, but it's much more rare and I haven't been able to trigger it myself so far. Watching the devices, it seems to happen when there is a momentary drop in communication between the eWON and the Talk2M servers. I'll be watching the Realtime Logs on the web interface, either via M2Web or connected directly through eCatcher. The connection will drop, reconnect within a few seconds or maybe a minute or two, and when it resumes the problem will be there. I've managed to see it happen a few times when I had the debugger attached to the running JVM, so I was able to step through the code and see what was happening as described in my previous post. Also, as previously mentioned, I used another Python-based MQTT client I wrote to monitor the $aws/events/presence/connected/+ $aws/events/presence/disconnected/+ topics, which the broker posts messages to when clients connect and disconnect. I could see the eWON MQTT client disconnect from the broker when the Talk2M connection dropped, and then reconnect to the broker after the Talk2M connection resumed. I would not have expected the Talk2M connection to affect the MQTT connection, but it was very predictable. When Talk2M dropped, MQTT would as well. But then, even though I can see the reconnect message on the broker, and the eWON getStatus() code reports a connection, the publish still doesn't happen. It's very strange.

Hopefully this will help!
Noel


RE: MQTT Publish Problem - simon - 01-04-2020

Hi Noel,

My example is running for more than one day with a modem disconnection every 10 min (MaxCallDuration of 10 min) and every time the MQTT connection comes back properly.

Since it is more real support case issue, can you open a case on the HMS support plateform - https://mysupport.hms.se ?
I would like to get a backup of your Ewon, including the support files.

By the way, this is the program I use to test :
  MQTT_EWONSUPPORT_NoelTest.zip (Size: 13,12 KB / Downloads: 57)

Simon