NATS Message Consumption Issue After Pod (NATS cluster) Restart in OpenShift #6025

mohamedsaleem18 · 2024-10-21T13:42:57Z

Observed behavior

NATS Alpine image (NATS 2.10.19) with Jetstream enabled is deployed in a three-node cluster within a Red Hat OpenShift environment. A headless service is exposed for applications deployed in the same OpenShift cluster to connect.

Whenever the NATS cluster pods are restarted with a rolling update, the connected application can publish messages successfully but is unable to consume them. The application client must be restarted on their end to resolve this issue. Can you please provide a resolution for this problem?

The application client (Java) uses the following seed URLs to connect to the NATS cluster for publishing and subscribing to messages:
nats://nats-0.nats-headless.ws-nats:4222,nats://nats-1.nats-headless.ws-nats:4222,nats://nats-2.nats-headless.ws-nats:4222

The NATS server in the cluster uses the following URLs in the nats-server.config to form the cluster:
nats://nats-0.nats-headless.ws-nats:6222,nats://nats-1.nats-headless.ws-nats:6222,nats://nats-2.nats-headless.ws-nats:6222

Expected behavior

The application should maintain its connection to the NATS cluster without needing to restart, even when pods are restarted or updated.
The application should be able to publish messages to the NATS cluster successfully during and after the rolling updates of the pods.
The application should be able to consume messages from the NATS cluster without interruption, receiving any messages that were published while it was connected.
If a connection is lost due to a pod restart, the client should automatically attempt to reconnect to the NATS server.

Server and client version

NATS Alpine image (NATS 2.10.19)
NATS Java client.

Host environment

RedHat OpenShift (on-premise)

Steps to reproduce

No response

The text was updated successfully, but these errors were encountered:

neilalexander · 2024-10-21T13:45:41Z

Please can you provide nats stream info and nats consumer info for the assets in question?

mohamedsaleem18 · 2024-10-21T16:42:15Z

Stream info

`? Select a Stream lpn
Information for Stream lpn created 2024-10-08 23:18:23

          Subjects: oe.lpn.dt, lpnDt, receiving.eb.lpn-d-req, eb.receiving.lpn-res-dtls, oe.wcl.palln, oe.wbl.grp-compe, oe.wsbl.grp-cnt, wsbl.oe.stas.tring, wsbl.oe.stas.palln, wsbl.receiving.stas.dck-dor, wsbl.oe.grp-frce-clse, wsbl.srting.stas.trng, wsbl.srting.stas.palln, wsbl.autoputaway.stas.mild, wsbl.oe.dtn-req, oe.wsbl.mild, oe.wcsbl.container-empty, oe.wsbl.vcant
          Replicas: 3
           Storage: File

Options:

         Retention: Limits
   Acknowledgments: true
    Discard Policy: Old
  Duplicate Window: 2m0s
        Direct Get: true
 Allows Msg Delete: true
      Allows Purge: true
    Allows Rollups: true

Limits:

  Maximum Messages: 100,000

Maximum Per Subject: 100,000
Maximum Bytes: 64 MiB
Maximum Age: 3d0h0m0s
Maximum Message Size: 98 KiB
Maximum Consumers: unlimited

Cluster Information:

              Name: nats
            Leader: nats-1
           Replica: nats-0, current, seen 516ms ago
           Replica: nats-2, current, seen 526ms ago

State:

          Messages: 61
             Bytes: 35 KiB
    First Sequence: 83 @ 2024-10-20 20:54:20
     Last Sequence: 143 @ 2024-10-21 11:12:39
  Active Consumers: 14
Number of Subjects: 5`

mohamedsaleem18 · 2024-10-21T16:52:30Z

Consumer info

`Information for Consumer lpn > regEnCntrStasHdler created 2024-10-08T23:38:49-05:00

Configuration:

                Name: regEnCntrStasHdler
           Pull Mode: true
      Filter Subject: wsbl.receiving.stas.dck-dor
      Deliver Policy: All
          Ack Policy: Explicit
            Ack Wait: 30.00s
       Replay Policy: Instant
     Max Ack Pending: 1,000
   Max Waiting Pulls: 512

Cluster Information:

                Name: nats
              Leader: nats-1
             Replica: nats-0, current, seen 824ms ago
             Replica: nats-2, current, seen 829ms ago

State:

Last Delivered Message: Consumer sequence: 27 Stream sequence: 133 Last delivery: 1h56m11s ago
Acknowledgment Floor: Consumer sequence: 27 Stream sequence: 133 Last Ack: 1h56m11s ago
Outstanding Acks: 0 out of maximum 1,000
Redelivered Messages: 0
Unprocessed Messages: 0
Waiting Pulls: 0 of maximum 512`

mohamedsaleem18 · 2024-10-24T18:12:49Z

Can you please provide resolution for the issue ?

sourabhaggrawal · 2024-10-25T04:08:14Z

I have also faced this issue with 1 replica pod (non clustered) , WQ stream with no message limitation and no ttl.
Consumer just stopped receiving messages and had to reboot the consumer app after which it started consuming messages.
nats-server 2.10.11

mohamedsaleem18 added the defect Suspected defect such as a bug or regression label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NATS Message Consumption Issue After Pod (NATS cluster) Restart in OpenShift #6025

NATS Message Consumption Issue After Pod (NATS cluster) Restart in OpenShift #6025

mohamedsaleem18 commented Oct 21, 2024

neilalexander commented Oct 21, 2024

mohamedsaleem18 commented Oct 21, 2024

mohamedsaleem18 commented Oct 21, 2024

mohamedsaleem18 commented Oct 24, 2024 •

edited

Loading

sourabhaggrawal commented Oct 25, 2024

NATS Message Consumption Issue After Pod (NATS cluster) Restart in OpenShift #6025

NATS Message Consumption Issue After Pod (NATS cluster) Restart in OpenShift #6025

Comments

mohamedsaleem18 commented Oct 21, 2024

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

neilalexander commented Oct 21, 2024

mohamedsaleem18 commented Oct 21, 2024

mohamedsaleem18 commented Oct 21, 2024

mohamedsaleem18 commented Oct 24, 2024 • edited Loading

sourabhaggrawal commented Oct 25, 2024

mohamedsaleem18 commented Oct 24, 2024 •

edited

Loading