-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka Crash #7
Comments
Your sample code is cut off. Would you please post example code that we can run? |
I think this will run. var fetch=require(‘node-fetch'); // irrelevant const { Kafka } = require ('gcn-kafka');
})() |
Tried to improve format. I think this will run. var fetch=require(‘node-fetch');
})() ` |
Posted excerpted code that ’should’ run to GitHub. I had trouble with spaces and tabs but may have gotten it on the second try.
GitHub newbie here.
… On Mar 22, 2023, at 10:44 AM, Leo Singer ***@***.***> wrote:
Your sample code is cut off. Would you please post example code that we can run?
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV7AJGFMX5RCC5I5GGTW5MF5RANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you authored the thread.
|
I've been letting our demo script from https://gcn.nasa.gov/docs/client#ecmascript-mjs run for several hours and I have seen similar log messages:
These probably correspond to times when my laptop's VPN flakes out. However, the script automatically reconnects and continues receiving alerts. Are you finding that you stop receiving alerts, or that the script actually terminates? If your client is recovering automatically and continuing to receive alerts, then I think that you can safely ignore these --- or at worst, treat them as warnings that your Internet connectivity is slightly unreliable. |
Thanks for looking into this.
The errors I wrote to you about caused a fatal error on my end. The script stopped running. As have you, I have gotten non-fatal errors from which I was able to restart. For example:
[ perfectly reasonable console.log statements and then…..]
{"level":"ERROR","timestamp":"2023-03-22T10:17:32.115Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-22T10:17:34.316Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-22T10:17:34.397Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"ERROR","timestamp":"2023-03-22T10:17:34.399Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"INFO","timestamp":"2023-03-22T10:17:34.700Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"ERROR","timestamp":"2023-03-22T10:17:36.282Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"INFO","timestamp":"2023-03-22T10:17:37.573Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","memberId":"kafkajs-17308c8c-48a5-4918-b4aa-e56cf744bd53","leaderId":"kafkajs-17308c8c-48a5-4918-b4aa-e56cf744bd53","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1969}
[followed by perfectly reasonable console.log messages]
But, as I wrote you earlier, after these messages my script stopped:
{"level":"ERROR","timestamp":"2023-03-22T12:08:23.130Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092 <http://kafka2.gcn.nasa.gov:9092/>","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":17,"size":10}
{"level":"ERROR","timestamp":"2023-03-22T12:08:23.131Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"}
{"level":"INFO","timestamp":"2023-03-22T12:08:23.210Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802”}
FYI - Using the time stamps on your script, "2023-03-23T04:57:39.094Z” , my script was running fine during that time as shown by these console.logs:
found a Notice
go to save a record:
2023-03-23 04:56:17
2023-03-23 04:56:17
typed time_created
{
recordType: 'Notices',
fields: {
COMMENTS1: { value: 'Retraction of an earlier Event' },
TRIGGER_DATE: { value: '' },
TRIGGER_NUM: { value: 'MS230323e' },
NOTICE_TYPE: { value: 'Test Retraction' },
TYPE: { value: 'Test' },
TRIGGER_TIME: { value: '04:56:17' },
FAR: { value: '' },
NOTICE_DATE: { value: 'Thu 23 Mar 23 04:56:17 UT' },
...
recordName: 'Thu23Mar23045617UT'
}
found a Notice
go to save a record:
2023-03-23 05:46:24
2023-03-23 05:46:24
typed time_created
{
recordType: 'Notices',
fields: {
COMMENTS1: { value: 'Hanford, Livingston contributed' },
TRIGGER_DATE: { value: '2023/03/23' },
TRIGGER_NUM: { value: 'MS230323f' },
NOTICE_TYPE: { value: 'Test Preliminary' },
TYPE: { value: 'Test' },
TRIGGER_TIME: { value: '05:46:09.0' },
FAR: { value: '1/347812 years' },
NOTICE_DATE: { value: 'Thu 23 Mar 23 05:46:24 UT' },
...
recordName: 'Thu23Mar23054624UT'
}
:
… On Mar 23, 2023, at 11:41 AM, Leo Singer ***@***.***> wrote:
I've been letting our demo script from https://gcn.nasa.gov/docs/client#ecmascript-mjs <https://gcn.nasa.gov/docs/client#ecmascript-mjs> run for several hours and I have seen similar log messages:
{"level":"ERROR","timestamp":"2023-03-23T04:57:39.094Z","logger":"kafkajs","message":"[Connection] Connection error: getaddrinfo ENOTFOUND kafka.gcn.nasa.gov","broker":"kafka.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: getaddrinfo ENOTFOUND kafka.gcn.nasa.gov\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)"}
{"level":"ERROR","timestamp":"2023-03-23T04:57:39.097Z","logger":"kafkajs","message":"[BrokerPool] Failed to connect to seed broker, trying another broker from the list: Connection error: getaddrinfo ENOTFOUND kafka.gcn.nasa.gov","retryCount":4,"retryTime":4712}
{"level":"ERROR","timestamp":"2023-03-23T04:57:43.825Z","logger":"kafkajs","message":"[Connection] Connection error: getaddrinfo ENOTFOUND kafka.gcn.nasa.gov","broker":"kafka.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: getaddrinfo ENOTFOUND kafka.gcn.nasa.gov\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)"}
{"level":"ERROR","timestamp":"2023-03-23T04:57:43.826Z","logger":"kafkajs","message":"[BrokerPool] Failed to connect to seed broker, trying another broker from the list: Connection error: getaddrinfo ENOTFOUND kafka.gcn.nasa.gov","retryCount":5,"retryTime":7834}
{"level":"ERROR","timestamp":"2023-03-23T04:57:43.827Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNumberOfRetriesExceeded: Connection error: getaddrinfo ENOTFOUND kafka.gcn.nasa.gov","groupId":"89b18835-ca33-4b9e-a5fe-34bb2d887dee","retryCount":5,"stack":"KafkaJSNonRetriableError\n Caused by: KafkaJSConnectionError: Connection error: getaddrinfo ENOTFOUND kafka.gcn.nasa.gov\n at TLSSocket.onError (/Users/lpsinger/Downloads/example/node_modules/kafkajs/src/network/connection.js:210:23)\n at TLSSocket.emit (node:events:513:28)\n at emitErrorNT (node:internal/streams/destroy:151:8)\n at emitErrorCloseNT (node:internal/streams/destroy:116:3)\n at process.processTicksAndRejections (node:internal/process/task_queues:82:21)"}
{"level":"INFO","timestamp":"2023-03-23T04:57:43.859Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"89b18835-ca33-4b9e-a5fe-34bb2d887dee"}
{"level":"ERROR","timestamp":"2023-03-23T04:57:43.859Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 7834ms","retryCount":5,"retryTime":7834,"groupId":"89b18835-ca33-4b9e-a5fe-34bb2d887dee"}
These probably correspond to times when my laptop's VPN flakes out. However, the script automatically reconnects and continues receiving alerts.
Are you finding that you stop receiving alerts, or that the script actually terminates? If your client is recovering automatically and continuing to receive alerts, then I think that you can safely ignore these --- or at worst, treat them as warnings that your Internet connectivity is slightly unreliable.
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQVY25XHGRQPQZRJOOMTW5RVLXANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you authored the thread.
|
Like I said, those log messages were in my output were probably due to my VPN dropping out momentarily: a local network connectivity issue. So it wouldn't be correlated with any warnings in your log output. |
Within 1 second of the time that your script crashed (2023-03-22T12:08:23.130Z), one of our Kafka brokers logged the following message (IP addresses redacted):
However, that log message was from kafka1.gcn.nasa.gov, whereas your log message refers to kafka2.gcn.nasa.gov. So it is not clear whether or not this is related. Have you been able to reproduce this more than once? |
I have not had this error repeat. I have been up continually since then without a problem.
… On Mar 24, 2023, at 12:48 AM, Leo Singer ***@***.***> wrote:
Within 1 second of the time that your script crashed (2023-03-22T12:08:23.130Z), one of our Kafka brokers logged the following message (IP addresses redacted):
[2023-03-22 20:08:23,612] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1] Failed authentication with /xxx.xxx.xxx.xxx (Unsupported SASL mechanism SCRAM-SHA-512) (org.apache.kafka.common.network.Selector)
However, that log message was from kafka1.gcn.nasa.gov, whereas your log message refers to kafka2.gcn.nasa.gov. So it is not clear whether or not this is related.
Have you been able to reproduce this more than once?
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV656Q2SELLJGK43BMLW5URTFANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you authored the thread.
|
Nor have I. Please report again on this issue if it recurs. Thanks! |
Fatal errors happened again earlier this morning:
{"level":"ERROR","timestamp":"2023-03-25T15:25:42.748Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":130,"size":10}
{"level":"ERROR","timestamp":"2023-03-25T15:25:42.749Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"d73f076f-f26b-4b3c-b45d-6e002596a6a7","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"}
{"level":"INFO","timestamp":"2023-03-25T15:25:42.836Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"d73f076f-f26b-4b3c-b45d-6e002596a6a7"}
[vps37660]$
similar to last time:
{"level":"ERROR","timestamp":"2023-03-22T12:08:23.130Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092 <http://kafka2.gcn.nasa.gov:9092/>","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":17,"size":10}
{"level":"ERROR","timestamp":"2023-03-22T12:08:23.131Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"}
{"level":"INFO","timestamp":"2023-03-22T12:08:23.210Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802”}
Differences noted in red: the addition of "\n at runMicrotasks (<anonymous>)” and the time stamps/ids
… On Mar 24, 2023, at 11:21 AM, Leo Singer ***@***.***> wrote:
Nor have I. Please report again on this issue if it recurs. Thanks!
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQVZDVAGDBERJRNU7H23W5W3XVANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you authored the thread.
|
It turns out that my forever script continues to run but I no longer receive Kafka notices (assuming they have come out since earlier this morning).
Any idea how to detect this error and restart with:
"await consumer.run({“
or perhaps
"(async () => {"
Note that in my code I have a few lines in
catch (error) {
and
.catch(function(error){
that would have printed caught errors to the console - they did not.
Thanks.
Peter
|
You should not have to restart your Kafka client. Most standard Kafka clients are designed to recover automatically from network connectivity outages, outages of the broker, and so on. If it isn't recovering automatically, then we need to report that upstream as a bug. |
@PeterBKramer, how long does your |
I added console logs to collect that timing information. There are three different delays in the code while my DreamHost server interacts with my CloudKit database and 1) gets database authorization, 2) saves a record and 3)deletes a record. I use “.then” functions in node.js to handle those delays.
I believe that node.js handles those .then calls asynchronously and can receive another eachMessage while awaiting a .then response from CloudKit. If so, the answer to your question would be "less than .001 seconds".
If the node.js code is not handling them asynchronously and cannot handle another eachMessage before the three .then are executed then the answer to your question would be “usually 1.1 seconds" but on one occasion out of about 30 examined it was “5.2 seconds" because of a delay in getting a response to an authorization request. I cannot say how long it might have been during the fatal error.
P
… On Mar 25, 2023, at 11:38 PM, Leo Singer ***@***.***> wrote:
@PeterBKramer <https://github.com/PeterBKramer>, how long does your eachMessage callback take to run? I wonder if KafkaJS might be assuming that it returns quickly, and it might get into an invalid state if it does not.
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV4FRRJLDIHAGIGF7ETW5624HANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you were mentioned.
|
Further to my timing info below - I realize that the crashes I recorded occurred at 8 minutes after the hour and 25 minutes after the hour. No MS Events would be sent at that time and the console logs indicate that the script properly handled the latest MS Event at 56 minutes after the earlier hour.
My system was sitting idle awaiting the next kafka notice when then crashes occurred.
P
… On Mar 27, 2023, at 3:14 AM, Peter B Kramer ***@***.***> wrote:
I added console logs to collect that timing information. There are three different delays in the code while my DreamHost server interacts with my CloudKit database and 1) gets database authorization, 2) saves a record and 3)deletes a record. I use “.then” functions in node.js to handle those delays.
I believe that node.js handles those .then calls asynchronously and can receive another eachMessage while awaiting a .then response from CloudKit. If so, the answer to your question would be "less than .001 seconds".
If the node.js code is not handling them asynchronously and cannot handle another eachMessage before the three .then are executed then the answer to your question would be “usually 1.1 seconds" but on one occasion out of about 30 examined it was “5.2 seconds" because of a delay in getting a response to an authorization request. I cannot say how long it might have been during the fatal error.
P
> On Mar 25, 2023, at 11:38 PM, Leo Singer ***@***.*** ***@***.***>> wrote:
>
>
> @PeterBKramer <https://github.com/PeterBKramer>, how long does your eachMessage callback take to run? I wonder if KafkaJS might be assuming that it returns quickly, and it might get into an invalid state if it does not.
>
> —
> Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV4FRRJLDIHAGIGF7ETW5624HANCNFSM6AAAAAAWD5GXOY>.
> You are receiving this because you were mentioned.
>
|
Interesting that you are mixing the
OK, I wouldn't worry about that, then. |
Things seem to be ‘fixed’ in that the crash is no longer fatal. My forever script continues to run - Kafka now seems be restarting each time. Here is what is now happening:
I am saving all Mock packages - 3 per hour typically at :46 :51 and :56 or :49 :54 :59
I have been running for 78 hours and have had 5 “timeout” events. One of those events may have been 3 separate timeouts. The system restarted after each timeout event within 5 seconds (except for the 3 consecutive timeout event - 13 seconds).
Thank you for fixing my problem - I can live with this. But if you feel something remains amiss, I will be glad to keep you updated.
Peter
…----------------
Here are the error messages (TL:DR)
Started at 2023-03-28T05:46:37.744Z
went fine including a save at 2023-03-28T09:56:27.189Z
then errors:
{"level":"ERROR","timestamp":"2023-03-28T10:14:05.496Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:05.498Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-28T10:14:05.586Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:05.588Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:05.628Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"INFO","timestamp":"2023-03-28T10:14:05.888Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897”}
{"level":"ERROR","timestamp":"2023-03-28T10:14:06.892Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:06.893Z","logger":"kafkajs","message":"[BrokerPool] Failed to connect to seed broker, trying another broker from the list: Connection timeout","retryCount":0,"retryTime":338}
{"level":"ERROR","timestamp":"2023-03-28T10:14:07.685Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"INFO","timestamp":"2023-03-28T10:14:09.681Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-bf692587-ef9d-4dfb-85aa-628ea4ef65de","leaderId":"kafkajs-bf692587-ef9d-4dfb-85aa-628ea4ef65de","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1152}
{"level":"ERROR","timestamp":"2023-03-28T10:14:10.837Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:11.077Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:14.933Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-28T10:14:15.012Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:15.012Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-28T10:14:15.313Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-28T10:14:18.182Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-aa4d1617-b82e-4616-98cb-50a13bf18620","leaderId":"kafkajs-aa4d1617-b82e-4616-98cb-50a13bf18620","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1919}
and working again at 2023-03-28T10:46:38.018Z
went fine including a save at 2023-03-29T03:56:32.873Z
then more errors:
{"level":"ERROR","timestamp":"2023-03-29T03:57:14.580Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-29T03:57:14.713Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"ERROR","timestamp":"2023-03-29T03:57:16.918Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-29T03:57:16.998Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-29T03:57:16.999Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-29T03:57:17.299Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-29T03:57:20.173Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-9bf1d28a-d7b4-442b-bcc4-4e8da3e71bcd","leaderId":"kafkajs-9bf1d28a-d7b4-442b-bcc4-4e8da3e71bcd","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1888}
and working again at 2023-03-29T04:46:38.191Z
worked until I had a network error at 2023-03-29T09:46:42.779Z
and continued to work at 2023-03-29T09:51:16.500Z
went fine including a save at: 2023-03-30T18:59:26.330Z
then more errors:
{"level":"ERROR","timestamp":"2023-03-30T19:27:15.472Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-30T19:27:15.474Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-30T19:27:15.553Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-30T19:27:15.555Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-30T19:27:15.854Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-30T19:27:18.672Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-fc5e1daf-ec3c-44cc-8234-b9ec48bfc8ab","leaderId":"kafkajs-fc5e1daf-ec3c-44cc-8234-b9ec48bfc8ab","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1875}
{"level":"ERROR","timestamp":"2023-03-30T19:27:19.637Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)”}
and working again at 2023-03-30T19:49:34.980Z
went fine including a save at: 2023-03-31T02:49:36.505Z
then more errors:
{"level":"ERROR","timestamp":"2023-03-31T02:52:45.504Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-31T02:52:48.299Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-31T02:52:48.378Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-31T02:52:48.379Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-31T02:52:48.679Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-31T02:52:49.778Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"INFO","timestamp":"2023-03-31T02:52:51.553Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-2bce0b3e-0ae4-439e-863b-bb05bca67558","leaderId":"kafkajs-2bce0b3e-0ae4-439e-863b-bb05bca67558","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1913}
and working again at 2023-03-31T02:54:12.339Z
went fine including a save at: 2023-03-31T04:59:27.181Z
then more errors:
{"level":"ERROR","timestamp":"2023-03-31T05:19:54.641Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-31T05:19:54.641Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-31T05:19:54.720Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-31T05:19:54.721Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-31T05:19:55.020Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-31T05:19:57.201Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-d734e059-d0d3-468e-a996-4b7a34d7b9fb","leaderId":"kafkajs-d734e059-d0d3-468e-a996-4b7a34d7b9fb","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1211}
and working again at 2023-03-31T05:49:35.828Z
went fine including a save at 2023-03-31T11:59:27.453Z
Then terminated by me.
|
I was wrong. A fatal crash occurred again - earlier today:
Deleted Record for Sat01Apr23095411UT2023-04-01T17:59:36.490Z
{"level":"ERROR","timestamp":"2023-04-01T18:22:51.388Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":49,"size":10}
{"level":"ERROR","timestamp":"2023-04-01T18:22:51.388Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"90006642-264b-4f16-91f9-821d3a69c87c","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"}
{"level":"INFO","timestamp":"2023-04-01T18:22:51.467Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"90006642-264b-4f16-91f9-821d3a69c87c”}
Any ideas?
Peter
… On Mar 31, 2023, at 4:22 PM, Peter B Kramer ***@***.***> wrote:
Things seem to be ‘fixed’ in that the crash is no longer fatal. My forever script continues to run - Kafka now seems be restarting each time. Here is what is now happening:
I am saving all Mock packages - 3 per hour typically at :46 :51 and :56 or :49 :54 :59
I have been running for 78 hours and have had 5 “timeout” events. One of those events may have been 3 separate timeouts. The system restarted after each timeout event within 5 seconds (except for the 3 consecutive timeout event - 13 seconds).
Thank you for fixing my problem - I can live with this. But if you feel something remains amiss, I will be glad to keep you updated.
Peter
----------------
Here are the error messages (TL:DR)
Started at 2023-03-28T05:46:37.744Z
went fine including a save at 2023-03-28T09:56:27.189Z
then errors:
{"level":"ERROR","timestamp":"2023-03-28T10:14:05.496Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:05.498Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-28T10:14:05.586Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:05.588Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:05.628Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"INFO","timestamp":"2023-03-28T10:14:05.888Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897”}
{"level":"ERROR","timestamp":"2023-03-28T10:14:06.892Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:06.893Z","logger":"kafkajs","message":"[BrokerPool] Failed to connect to seed broker, trying another broker from the list: Connection timeout","retryCount":0,"retryTime":338}
{"level":"ERROR","timestamp":"2023-03-28T10:14:07.685Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"INFO","timestamp":"2023-03-28T10:14:09.681Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-bf692587-ef9d-4dfb-85aa-628ea4ef65de","leaderId":"kafkajs-bf692587-ef9d-4dfb-85aa-628ea4ef65de","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1152}
{"level":"ERROR","timestamp":"2023-03-28T10:14:10.837Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:11.077Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:14.933Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-28T10:14:15.012Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-28T10:14:15.012Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-28T10:14:15.313Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-28T10:14:18.182Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-aa4d1617-b82e-4616-98cb-50a13bf18620","leaderId":"kafkajs-aa4d1617-b82e-4616-98cb-50a13bf18620","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1919}
and working again at 2023-03-28T10:46:38.018Z
went fine including a save at 2023-03-29T03:56:32.873Z
then more errors:
{"level":"ERROR","timestamp":"2023-03-29T03:57:14.580Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-29T03:57:14.713Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"ERROR","timestamp":"2023-03-29T03:57:16.918Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-29T03:57:16.998Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-29T03:57:16.999Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-29T03:57:17.299Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-29T03:57:20.173Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-9bf1d28a-d7b4-442b-bcc4-4e8da3e71bcd","leaderId":"kafkajs-9bf1d28a-d7b4-442b-bcc4-4e8da3e71bcd","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1888}
and working again at 2023-03-29T04:46:38.191Z
worked until I had a network error at 2023-03-29T09:46:42.779Z
and continued to work at 2023-03-29T09:51:16.500Z
went fine including a save at: 2023-03-30T18:59:26.330Z
then more errors:
{"level":"ERROR","timestamp":"2023-03-30T19:27:15.472Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-30T19:27:15.474Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-30T19:27:15.553Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-30T19:27:15.555Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-30T19:27:15.854Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-30T19:27:18.672Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-fc5e1daf-ec3c-44cc-8234-b9ec48bfc8ab","leaderId":"kafkajs-fc5e1daf-ec3c-44cc-8234-b9ec48bfc8ab","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1875}
{"level":"ERROR","timestamp":"2023-03-30T19:27:19.637Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)”}
and working again at 2023-03-30T19:49:34.980Z
went fine including a save at: 2023-03-31T02:49:36.505Z
then more errors:
{"level":"ERROR","timestamp":"2023-03-31T02:52:45.504Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-31T02:52:48.299Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-31T02:52:48.378Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-31T02:52:48.379Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-31T02:52:48.679Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-31T02:52:49.778Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"INFO","timestamp":"2023-03-31T02:52:51.553Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-2bce0b3e-0ae4-439e-863b-bb05bca67558","leaderId":"kafkajs-2bce0b3e-0ae4-439e-863b-bb05bca67558","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1913}
and working again at 2023-03-31T02:54:12.339Z
went fine including a save at: 2023-03-31T04:59:27.181Z
then more errors:
{"level":"ERROR","timestamp":"2023-03-31T05:19:54.641Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-31T05:19:54.641Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-31T05:19:54.720Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"ERROR","timestamp":"2023-03-31T05:19:54.721Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-31T05:19:55.020Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897"}
{"level":"INFO","timestamp":"2023-03-31T05:19:57.201Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ca891007-3aae-4ed4-960b-24f44ba37897","memberId":"kafkajs-d734e059-d0d3-468e-a996-4b7a34d7b9fb","leaderId":"kafkajs-d734e059-d0d3-468e-a996-4b7a34d7b9fb","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1211}
and working again at 2023-03-31T05:49:35.828Z
went fine including a save at 2023-03-31T11:59:27.453Z
Then terminated by me.
|
It has now happened again. I was unable to respond correctly. A little Node.JS code help would be appreciated.
I received these error messages:
{"level":"ERROR","timestamp":"2023-04-04T07:41:17.761Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":66,"size":10}
{"level":"ERROR","timestamp":"2023-04-04T07:41:17.762Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"666310bc-5a24-4f31-a0c2-0b4b0d5be652","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"}
{"level":"INFO","timestamp":"2023-04-04T07:41:17.849Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"666310bc-5a24-4f31-a0c2-0b4b0d5be652”}
you are here —WARNING
As indicated by the console ‘ you are here —WARNING', at that point the following method was called in my code:
consumer.on(‘consumer.crash’, async(payload)++>{
console.log(‘ you are here —WARNING’);
const isNonRetriableError = payload.payload.error instanceof KafkaJSNonRetriableError;
const isNumberOfRetriesExceeded = payload.payload.error instanceof KafkaJSNumberOfRetriesExceeded;
In that method I am trying to restart the process. But first I need to be certain that the error was KafkaJSNonRetriableError (and that I have not exceeded the number of ‘retries’).
But after typing out the console.log my process crashes with:
{"level":"ERROR","timestamp":"2023-04-04T07:41:17.851Z","logger":"kafkajs","message":"[Consumer] Failed to execute listener: KafkaJSNonRetriableError is not defined","eventName":"consumer.crash","stack":"ReferenceError: KafkaJSNonRetriableError is not defined\n at /home/peterb2/kafkaLVC.cjs:81:86\n at EventEmitter.<anonymous> (/home/peterb2/node_modules/kafkajs/src/consumer/index.js:315:23)\n at EventEmitter.emit (events.js:400:28)\n at InstrumentationEventEmitter.emit (/home/peterb2/node_modules/kafkajs/src/instrumentation/emitter.js:21:20)\n at Runner.onCrash (/home/peterb2/node_modules/kafkajs/src/consumer/index.js:286:30)\n at processTicksAndRejections (internal/process/task_queues.js:95:5)”}
How can I read the payload content correctly in consumer.on(‘consumer.crash’, async(payload)++>{ ?
Thanks
Peter
|
Is your full script somewhere on GitHub where I can view it? |
No. I only know how to respond to these emails...
Here is the one method that is called:
consumer.on('consumer.crash', async (payload) => {
console.log(' WARNING WARNING sss’);
const isNonRetriableError = payload.payload.error instanceof KafkaJSNonRetriableError;
const isNumberOfRetriesExceeded = payload.payload.error instanceof KafkaJSNumberOfRetriesExceeded;
if(!isNonRetrievableError){ // ERROR - I need to correct this spelling.
console.log(' WARNING WARNING stop because error is retrievable');
return;
}
if(isNumberOfRetriesExceeded){
console.log(' WARNING WARNING stop because number of retries exceeded');
return;
}
console.log('Consumer crashed on non-retriable error: restarting');
console.log(payload.payload.error);
try {
console.log('awaiting disconnect ');
await consumer.disconnect();
} finally {
console.log('Going to recall KafkaConnect');
setTimeout(async () => {
await consumer.connect();
if(reconnect){
console.log(' reconnecting in next 5 seconds ');
await kafkaConnect(); // I don't know about those "()"
console.log('reconnected');
}else{
console.log('did not try to reconnect because reconnect=FALSE');
}
}, 5000);
}
});
Here is the console printout (including the console.log containing ‘WARNING’):
{"level":"ERROR","timestamp":"2023-04-04T07:41:17.761Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":66,"size":10}
{"level":"ERROR","timestamp":"2023-04-04T07:41:17.762Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"666310bc-5a24-4f31-a0c2-0b4b0d5be652","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"}
{"level":"INFO","timestamp":"2023-04-04T07:41:17.849Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"666310bc-5a24-4f31-a0c2-0b4b0d5be652"}
WARNING WARNING sss
{"level":"ERROR","timestamp":"2023-04-04T07:41:17.851Z","logger":"kafkajs","message":"[Consumer] Failed to execute listener: KafkaJSNonRetriableError is not defined","eventName":"consumer.crash","stack":"ReferenceError: KafkaJSNonRetriableError is not defined\n at /home/peterb2/kafkaLVC.cjs:81:86\n at EventEmitter.<anonymous> (/home/peterb2/node_modules/kafkajs/src/consumer/index.js:315:23)\n at EventEmitter.emit (events.js:400:28)\n at InstrumentationEventEmitter.emit (/home/peterb2/node_modules/kafkajs/src/instrumentation/emitter.js:21:20)\n at Runner.onCrash (/home/peterb2/node_modules/kafkajs/src/consumer/index.js:286:30)\n at processTicksAndRejections (internal/process/task_queues.js:95:5)”}
… On Apr 4, 2023, at 10:06 AM, Leo Singer ***@***.***> wrote:
Is your full script somewhere on GitHub where I can view it?
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV5WF2IXA2XUXFZX6BLW7QTHJANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you were mentioned.
|
@PeterBKramer, I'm trying to reproduce this but it's hard to do it while seeing just bits of your code out of context. Here's what I need you to do:
I have been running a KafkaJS client for a day or so now in Amazon Fargate but haven't yet seen a crash. |
link:
https://gist.github.com/PeterBKramer/2dd52817383d941b7595adaebebcc9d3
Gravitational Wave Events Kafka handler
gist.github.com
I use a DreamHost VPS to receive these Kafka packages.
[vps37660]$ npm -v
8.19.1
[vps37660]$ node -v
v14.20.0
[vps37660]$ nvm --version
0.33.0
There are 2 issues - why am I getting fatal crashes and how can I recover from them. Can anyone help me with the second - what is wrong with my payload.payload.error that it does not return a yes or no for "instanceof KafkaJSNonRetriableError”.
Peter
… On Apr 4, 2023, at 10:31 AM, Leo Singer ***@***.***> wrote:
@PeterBKramer <https://github.com/PeterBKramer>, I'm trying to reproduce this but it's hard to do it while seeing just bits of your code out of context. Here's what I need you to do:
Post the simplest complete, self-contained script that reproduces the crash to https://gist.github.com <https://gist.github.com/> and place a link to it in this issue.
Keep a list of the UTC times of the crashes and put them in a text file, ideally also in a gist (https://gist.github.com <https://gist.github.com/>). Don't email every time it crashes, just add the timestamp to the list.
Note for us as many details as you can think of about your runtime environment: at minimum, your operating system and architecture, your version of NodeJS, your versions of gcn-kafka-js and KafkaJS.
Also, what kind of Internet connectivity does the machine on which you are testing have? Is it a cloud instance or server with a reliable connection? Is it a laptop that goes to sleep sometimes or roams from one WiFi network to another?
I have been running a KafkaJS client for a day or so now in Amazon Margate but haven't yet seen a crash.
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV6DMPD2HKSEESYPF5LW7QWFJANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you were mentioned.
|
I get a 404 error. Did you make a private Gist? Try making a public gist.
Would you please make two separate code samples? One with your crash handling, and one without. |
I posted code without my attempt to handle the crash.
here is the link to the ‘public’ code:
https://gist.github.com/PeterBKramer/2dd52817383d941b7595adaebebcc9d3
Gravitational Wave Events Kafka handler
gist.github.com
… On Apr 4, 2023, at 2:18 PM, Leo Singer ***@***.***> wrote:
link: https://gist.github.com/PeterBKramer/2dd52817383d941b7595adaebebcc9d3 <https://gist.github.com/PeterBKramer/2dd52817383d941b7595adaebebcc9d3%EF%BF%BC> Gravitational Wave Events Kafka handler gist.github.com
I get a 404 error. Did you make a private Gist? Try making a public gist.
There are 2 issues - why am I getting fatal crashes and how can I recover from them. Can anyone help me with the second - what is wrong with my payload.payload.error that it does not return a yes or no for "instanceof KafkaJSNonRetriableError”.
Would you please make two separate code samples? One with your crash handling, and one without.
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV36UXDWKROXM2UK5FDW7RQYLANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you were mentioned.
|
I still get a 404. |
How about this link:
https://gist.github.com/PeterBKramer/264e82962f94fc59317f60d82ddb05ea
Gravitational Wave Events App - public?
gist.github.com
… On Apr 4, 2023, at 2:18 PM, Leo Singer ***@***.***> wrote:
link: https://gist.github.com/PeterBKramer/2dd52817383d941b7595adaebebcc9d3 <https://gist.github.com/PeterBKramer/2dd52817383d941b7595adaebebcc9d3%EF%BF%BC> Gravitational Wave Events Kafka handler gist.github.com
I get a 404 error. Did you make a private Gist? Try making a public gist.
There are 2 issues - why am I getting fatal crashes and how can I recover from them. Can anyone help me with the second - what is wrong with my payload.payload.error that it does not return a yes or no for "instanceof KafkaJSNonRetriableError”.
Would you please make two separate code samples? One with your crash handling, and one without.
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV36UXDWKROXM2UK5FDW7RQYLANCNFSM6AAAAAAWD5GXOY>.
You are receiving this because you were mentioned.
|
I think this link will work...... |
Again, I am sorry about the spacing - tabs versus spaces.
I am trying to select the nature of the error (so I only restart if it is non-retriable) with:
I get:
Any suggestions? |
NodeJS 14.x is past end of life. Would you please try using a supported version of NodeJS? I recommend NodeJS 18.x which is the most recent LTS version. |
That code sample has your CloudKit code in it. Can your reproduce the crash without it? Please isolate the simplest code that reproduces the crash. |
You need to const { KafkaJSNonRetriableError } = require('kafkajs') |
Great news. I had two crashes over the past few hours. One was Retriable and the system recovered itself. The other was the NonRetriable crash that has been causing the problem. Using the recovery code above, and 'require' as instructed by Leo, the system recovered itself as desired. Here is the full payload package describing the event that was recovered: through to bottom 2023-04-07T02:54:23.591Z {"level":"ERROR","timestamp":"2023-04-07T02:57:47.740Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka3.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":125,"size":10} {"level":"ERROR","timestamp":"2023-04-07T02:57:48.306Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"6dafd3e1-eeb2-4d0e-95a7-dc4609a8d498","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at runMicrotasks ()\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"} {"level":"INFO","timestamp":"2023-04-07T02:57:48.397Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"6dafd3e1-eeb2-4d0e-95a7-dc4609a8d498"} WARNING WARNING sss {"level":"INFO","timestamp":"2023-04-07T02:57:48.405Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"6dafd3e1-eeb2-4d0e-95a7-dc4609a8d498"} Going to recall KafkaConnect {"level":"ERROR","timestamp":"2023-04-07T02:57:55.051Z","logger":"kafkajs","message":"[Connection] Response Metadata(key: 3, version: 6)","broker":"kafka.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Not authorized to access topics: [Topic authorization failed]","correlationId":3,"size":499} |
That's great. I'm still trying to reproduce the crash, and I haven't seen one yet. Once again, would you please post a minimal script that reproduces the crash? |
Minimal code is below. I am running this type of code simultaneously from the same server looking at different "consumer.subscribe topic" entries. The code runs with "forever start -a filename.cjs". For example, on April 7 at 02:24:03 GMT I had a NonRetriable crash when looking at topics:[gcn.classic.text.SNEWS] but at the same time different code looking at {"level":"ERROR","timestamp":"2023-04-07T02:57:47.740Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka3.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":125,"size":10} {"level":"ERROR","timestamp":"2023-04-07T02:57:48.306Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"6dafd3e1-eeb2-4d0e-95a7-dc4609a8d498","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at runMicrotasks ()\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"} KafkaJSNonRetriableError: Request is not valid given the current SASL state I was able to recover using code in: consumer.on('consumer.crash', async (payload) => { MINIMAL CODE: `var fetch=require('node-fetch');
} |
I can't run your example:
Does it reproduce if I remove |
I see now that the line: var fetch = require('node-fetch') can be omitted. It is left over from:
|
OK. What's left is basically our own demo script, right? Were you able to reproduce with a recent, supported version of NodeJS? |
I have not yet updated. |
New crashes to report. System 1 recovered from a crash at April 8, 01:32:50. (at this time System 2 showed no error) Crash logs: System 1 crash: {"level":"ERROR","timestamp":"2023-04-08T01:32:50.241Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":81,"size":10} My system was able to reconnect after this crash. System 2 crash: {"level":"INFO","timestamp":"2023-04-08T06:03:33.529Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"4a7f5f26-68be-47e8-8646-485451c89997"} My system then tried to reconnect but: {"level":"ERROR","timestamp":"2023-04-08T06:03:43.043Z","logger":"kafkajs","message":"[BrokerPool] outgoing request timed out after 3500ms","retryCount":0,"retryTime":257} |
@PeterBKramer, thanks for continuing to catalog these crashes. Unfortunately, neither of my two test client instances (in AWS, with ultra-reliable network connectivity; and on my laptop, with flaky network connectivity) have reproduced this. I can only think of two significant differences in our setups: the Node.js version, and where it is running. Have you tried a recent version of Node.js? Are you seeing this only on your Dreamhost box? Do you also see these crashes on your own desktop machine? Do you have another physical machine or another compute resource in a different cloud provider where you can try this out? |
Thank you for continuing to explore this issue. I will try to update my version of node.js next week. Last time I tried to do that it created a number of issues. Hopefully it will be easier this time. Regarding where I am running - I only run from my DreamHost VPS. I run multiple scripts all using 'forever'. One script is downloading from gcn.classic.text.SNEWS and has never seen a crash. Another is downloading from 6 different gcn.classic.text.LVC_ topics and has seen multiple crashes. The third is downloading only from igwn.gwalert and sees a few crashes. The good news is that I am now able to recover cleanly from all crashes. While all my scripts are running from the same DreamHost VPS they crash at different times - and the other scripts continue to operate while one is crashing. All of the crashes generate the following errors: {"level":"ERROR","timestamp":"2023-04-09T03:49:02.562Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":26,"size":10} My code then catches the error and generates the following console.log messages: WARNING WARNING sss My code then disconnects and reconnects smoothly within a few seconds. I have had such errors generated at the following times: 2023-04-11T13:32:24.090Z (one script) 2023-04-09T03:49:02.562Z (the other script) |
On the continuing saga, again, before I update node.js, I record the following 3 errors from the 2 scripts over the past 2 days tying the 2 scripts' score 3 to 3: 2023-04-12T19:05:42.165Z (one script) 2023-04-13T12:29:37.026Z (the other script) and again - these are 'no-retrieable' and my code is able to recover smoothly. |
I have upgraded to node.js v18.16.0 but still have errors. I have recently received the following 2 errors:
// At this point my code captures the error and produces this information:
{"level":"ERROR","timestamp":"2023-04-18T18:52:11.849Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka3.gcn.nasa.gov:9092","clientId":"kafkajs"} // At this point my code captured the error and produced the following information: |
Would you please make a list of the UTC times of these crashes? |
No Kafka errors were generated in my script monitoring igwn.gwalert since 2023-04-19T00:00 (when you sent your request). I will keep this updated. (The errors previously noted above have UTC times in their error messages) |
It is now 2023-04-20T05:00:00.000Z (My main channel subscribes to igmn.gwalert and gcn.classic.text.SNEWS. |
@PeterBKramer, as I said, would you please make a table or list of the UTC times of these crashes? My next step will be to correlate these error messages with our log files, but it's going to be a lot of work to collect all of the timestamps from your posts. |
I have started a spreadsheet. There are a few different types of crashes. |
Did you close this issue intentionally? |
I did not intentionally close this issue. |
That's alright. Thank you for putting together the spreadsheet. That's a huge help. |
Revised spreadsheet with 29 errors over 7 days; all that required restarting the script. Of the 17 from the igwn.gwalert script: |
I had an error at 2023-04-26T16:40:19.962Z from which I did not recover. It started with a typical "[Connection] Response SaslHandshake(key: 17, version: 1)". But that did not generate the typical crash which my code can catch and recover from. Instead it was followed by a: {"level":"ERROR","timestamp":"2023-04-26T16:40:19.963Z","logger":"kafkajs","message":"[BrokerPool] Failed to connect to broker, reconnecting","retryCount":0,"retryTime":261} that ended 13 seconds later with a: {"level":"ERROR","timestamp":"2023-04-26T16:40:32.225Z","logger":"kafkajs","message":"[BrokerPool] KafkaJSLockTimeout: Timeout while acquiring lock (1 waiting locks): "connect to broker kafka2.gcn.nasa.gov:9092"","retryCount":1,"retryTime":542,"stack":"KafkaJSLockTimeout: Timeout while acquiring lock (1 waiting locks): "connect to broker kafka2.gcn.nasa.gov:9092"\n at Timeout._onTimeout (/home/peterb2/node_modules/kafkajs/src/utils/lock.js:48:23)\n at listOnTimeout (node:internal/timers:569:17)\n at process.processTimers (node:internal/timers:512:7)"} and the script stopped. |
Final spreadsheet containing 35 crashes from the script subscribed to gcn.classic.text.LVC_ Kafkas and 37 crashes from the script subscribed to igwn.gwalert and gcn.classic.text.SNEWS |
I have been 'successfully' connected to igwn.gwalert and receiving alerts for many hours. But after many successful downloads I received this error message and a fatal crash:
{"level":"ERROR","timestamp":"2023-03-22T12:08:23.130Z","logger":"kafkajs","message":"[Connection] Response SaslHandshake(key: 17, version: 1)","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","error":"Request is not valid given the current SASL state","correlationId":17,"size":10}
{"level":"ERROR","timestamp":"2023-03-22T12:08:23.131Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSNonRetriableError: Request is not valid given the current SASL state","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","stack":"KafkaJSNonRetriableError: Request is not valid given the current SASL state\n at /home/peterb2/node_modules/kafkajs/src/retry/index.js:55:18\n at runMicrotasks ()\n at processTicksAndRejections (internal/process/task_queues.js:95:5)"}
{"level":"INFO","timestamp":"2023-03-22T12:08:23.210Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802”}
I am Running node.js v14.20.0
I run a node.js script (.cjs) monitoring kafka using forever (v4.0.3).
This is an extract of the node.js script:
The system started with these messages (except for the first two, the additional messages are unusual) (and when I just now restarted the script only the first two appeared)
{"level":"INFO","timestamp":"2023-03-21T22:39:33.323Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"INFO","timestamp":"2023-03-21T22:39:35.109Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","memberId":"kafkajs-911d2926-de7f-40fd-ad01-2d1fc460dadf","leaderId":"kafkajs-911d2926-de7f-40fd-ad01-2d1fc460dadf","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1784}
{"level":"ERROR","timestamp":"2023-03-21T22:39:36.197Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-21T22:39:36.359Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"ERROR","timestamp":"2023-03-21T22:39:36.390Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"ERROR","timestamp":"2023-03-21T22:39:36.520Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"ERROR","timestamp":"2023-03-21T22:39:36.599Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-21T22:39:36.683Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"ERROR","timestamp":"2023-03-21T22:39:36.684Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"INFO","timestamp":"2023-03-21T22:39:36.984Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"INFO","timestamp":"2023-03-21T22:39:39.335Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","memberId":"kafkajs-04355959-3447-4455-8ffc-9eb0ab18edc9","leaderId":"kafkajs-04355959-3447-4455-8ffc-9eb0ab18edc9","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1580}
found a Notice
(node:150457) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use
node --trace-deprecation ...
to show where the warning was created)During the evening, while operating normally, the system at one point sent these messages:
{"level":"ERROR","timestamp":"2023-03-22T10:17:32.115Z","logger":"kafkajs","message":"[Connection] Connection timeout","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs"}
{"level":"ERROR","timestamp":"2023-03-22T10:17:34.316Z","logger":"kafkajs","message":"[Consumer] Crash: KafkaJSConnectionError: Connection timeout","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","stack":"KafkaJSConnectionError: Connection timeout\n at Timeout.onTimeout [as _onTimeout] (/home/peterb2/node_modules/kafkajs/src/network/connection.js:223:23)\n at listOnTimeout (internal/timers.js:557:17)\n at processTimers (internal/timers.js:500:7)"}
{"level":"INFO","timestamp":"2023-03-22T10:17:34.397Z","logger":"kafkajs","message":"[Consumer] Stopped","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"ERROR","timestamp":"2023-03-22T10:17:34.399Z","logger":"kafkajs","message":"[Consumer] Restarting the consumer in 300ms","retryTime":300,"groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"INFO","timestamp":"2023-03-22T10:17:34.700Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802"}
{"level":"ERROR","timestamp":"2023-03-22T10:17:36.282Z","logger":"kafkajs","message":"[Connection] Connection error: Client network socket disconnected before secure TLS connection was established","broker":"kafka2.gcn.nasa.gov:9092","clientId":"kafkajs","stack":"Error: Client network socket disconnected before secure TLS connection was established\n at connResetException (internal/errors.js:639:14)\n at TLSSocket.onConnectEnd (_tls_wrap.js:1570:19)\n at TLSSocket.emit (events.js:412:35)\n at endReadableNT (internal/streams/readable.js:1333:12)\n at processTicksAndRejections (internal/process/task_queues.js:82:21)"}
{"level":"INFO","timestamp":"2023-03-22T10:17:37.573Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"ea3e9f07-6a67-4d8d-903f-afe206208802","memberId":"kafkajs-17308c8c-48a5-4918-b4aa-e56cf744bd53","leaderId":"kafkajs-17308c8c-48a5-4918-b4aa-e56cf744bd53","isLeader":true,"memberAssignment":{"igwn.gwalert":[0]},"groupProtocol":"RoundRobinAssigner","duration":1969}
The text was updated successfully, but these errors were encountered: