Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NATS Message Consumption Issue After Pod (NATS cluster) Restart in OpenShift #6025

Open
mohamedsaleem18 opened this issue Oct 21, 2024 · 5 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@mohamedsaleem18
Copy link

Observed behavior

NATS Alpine image (NATS 2.10.19) with Jetstream enabled is deployed in a three-node cluster within a Red Hat OpenShift environment. A headless service is exposed for applications deployed in the same OpenShift cluster to connect.

Whenever the NATS cluster pods are restarted with a rolling update, the connected application can publish messages successfully but is unable to consume them. The application client must be restarted on their end to resolve this issue. Can you please provide a resolution for this problem?

The application client (Java) uses the following seed URLs to connect to the NATS cluster for publishing and subscribing to messages:
nats://nats-0.nats-headless.ws-nats:4222,nats://nats-1.nats-headless.ws-nats:4222,nats://nats-2.nats-headless.ws-nats:4222

The NATS server in the cluster uses the following URLs in the nats-server.config to form the cluster:
nats://nats-0.nats-headless.ws-nats:6222,nats://nats-1.nats-headless.ws-nats:6222,nats://nats-2.nats-headless.ws-nats:6222

Expected behavior

  1. The application should maintain its connection to the NATS cluster without needing to restart, even when pods are restarted or updated.

  2. The application should be able to publish messages to the NATS cluster successfully during and after the rolling updates of the pods.

  3. The application should be able to consume messages from the NATS cluster without interruption, receiving any messages that were published while it was connected.

  4. If a connection is lost due to a pod restart, the client should automatically attempt to reconnect to the NATS server.

Server and client version

NATS Alpine image (NATS 2.10.19)
NATS Java client.

Host environment

RedHat OpenShift (on-premise)

Steps to reproduce

No response

@mohamedsaleem18 mohamedsaleem18 added the defect Suspected defect such as a bug or regression label Oct 21, 2024
@neilalexander
Copy link
Member

Please can you provide nats stream info and nats consumer info for the assets in question?

@mohamedsaleem18
Copy link
Author

Stream info

`? Select a Stream lpn
Information for Stream lpn created 2024-10-08 23:18:23

          Subjects: oe.lpn.dt, lpnDt, receiving.eb.lpn-d-req, eb.receiving.lpn-res-dtls, oe.wcl.palln, oe.wbl.grp-compe, oe.wsbl.grp-cnt, wsbl.oe.stas.tring, wsbl.oe.stas.palln, wsbl.receiving.stas.dck-dor, wsbl.oe.grp-frce-clse, wsbl.srting.stas.trng, wsbl.srting.stas.palln, wsbl.autoputaway.stas.mild, wsbl.oe.dtn-req, oe.wsbl.mild, oe.wcsbl.container-empty, oe.wsbl.vcant
          Replicas: 3
           Storage: File

Options:

         Retention: Limits
   Acknowledgments: true
    Discard Policy: Old
  Duplicate Window: 2m0s
        Direct Get: true
 Allows Msg Delete: true
      Allows Purge: true
    Allows Rollups: true

Limits:

  Maximum Messages: 100,000

Maximum Per Subject: 100,000
Maximum Bytes: 64 MiB
Maximum Age: 3d0h0m0s
Maximum Message Size: 98 KiB
Maximum Consumers: unlimited

Cluster Information:

              Name: nats
            Leader: nats-1
           Replica: nats-0, current, seen 516ms ago
           Replica: nats-2, current, seen 526ms ago

State:

          Messages: 61
             Bytes: 35 KiB
    First Sequence: 83 @ 2024-10-20 20:54:20
     Last Sequence: 143 @ 2024-10-21 11:12:39
  Active Consumers: 14
Number of Subjects: 5`

@mohamedsaleem18
Copy link
Author

Consumer info

`Information for Consumer lpn > regEnCntrStasHdler created 2024-10-08T23:38:49-05:00

Configuration:

                Name: regEnCntrStasHdler
           Pull Mode: true
      Filter Subject: wsbl.receiving.stas.dck-dor
      Deliver Policy: All
          Ack Policy: Explicit
            Ack Wait: 30.00s
       Replay Policy: Instant
     Max Ack Pending: 1,000
   Max Waiting Pulls: 512

Cluster Information:

                Name: nats
              Leader: nats-1
             Replica: nats-0, current, seen 824ms ago
             Replica: nats-2, current, seen 829ms ago

State:

Last Delivered Message: Consumer sequence: 27 Stream sequence: 133 Last delivery: 1h56m11s ago
Acknowledgment Floor: Consumer sequence: 27 Stream sequence: 133 Last Ack: 1h56m11s ago
Outstanding Acks: 0 out of maximum 1,000
Redelivered Messages: 0
Unprocessed Messages: 0
Waiting Pulls: 0 of maximum 512`

@mohamedsaleem18
Copy link
Author

mohamedsaleem18 commented Oct 24, 2024

Can you please provide resolution for the issue ?

@sourabhaggrawal
Copy link

I have also faced this issue with 1 replica pod (non clustered) , WQ stream with no message limitation and no ttl.
Consumer just stopped receiving messages and had to reboot the consumer app after which it started consuming messages.
nats-server 2.10.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

3 participants