nats-io / nats.ex

Elixir client for NATS, the cloud native messaging system. https://nats.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Units tests, delays and random receive failures

ppff opened this issue · comments

commented

Hi,

I've been stuck on this for a few days now so I'm posting this with the hope that someone will be able to help me.

I have an application with a ConnectionSupervisor and a ConsumerSupervisor which spawns a Gnat.Services.Server. This Server is able to receive messages sent by a python micro-service running elsewhere on the cloud. For my unit tests, I emit the messages from the tests and expect answers and actions from the Server. Here is an example of a test:

test "'start' returns error if url doesn't exist" do
      params = %{url: "idontexist.com"}

      assert %{success: false} =
               Messages.json_body(
                 Gnat.request(:gnat, "jobs.start", Jason.encode!(params),
                   receive_timeout: 3_000
                 )
               )
    end

I have 10 tests like this one in a single file, which means they are executed sequentially. The problem is that one or two of them always fail. And always different ones. The reason is always a timeout on the answer.

Using logging tools, I was able to determine that the Server in fact doesn't get all the messages. However, by subscribing to the same topics with an external python script connected to the same NATS server, I was able to receive them all. This means that the messages are sent successfully and that the NATS server processes them well, but the Gnat.Services.Server seems to malfunction.

I tried to re-implement it myself and use it with Gnat.sub(gnat, pid_of_server, "jobs.*") but it didn't change anything. I also tried adding delays between tests but some messages still aren't received.

Have you ever experienced this? Do you have a solution or ideas I should try?

Thank you very much in advance, these tests failing randomly are very annoying.

commented

Hi, I have news: I figured that not guaranteeing delivery was by nature of basic NATS, so I decided to switch my whole process to Jetstream. It required quite a bit of work but tests never fail now!