You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that Vert.x caches non-existing addresses (previously shutdown) when using clustered event bus. Some eventbus send calls cause timeout exception because the listener on the address does not exist.
Version
Vert.x 4.5.5
hazelcast-kubernetes 3.2.3
Context
We are using clustered Vert.x with hazelcast kubernetes discovery. We have multiple kubernetes pods and each contains one Verticle.
If multiple nodes are killed for any reason, when they are started again, the cluster is established, but Vert.X doesn't seem to be aware of that. It looks like Vert.x is trying to send event bus messages to addresses that no longer exist in the cluster.
The only way we are able to solve this problem is to shut down all Verticles and then start them again. If at least one Verticles stays up, it propagates old addresses to every other new one.
Potentially relevant log statements
{
"time": "2024-05-12T10:13:04.811512654Z",
"level": "ERROR",
"class": "com.myorg.MyClass",
"message": "Received unknown error code. Error code received is -1, message received is Timed out after waiting 30000(ms) for a reply. address: __vertx.reply.d9179ccb-e25f-4e4e-9189-ff04368e4abb, repliedAddress: status/check/NotifyService."
}
{
"time": "2024-05-12T10:06:34.812700633Z",
"level": "WARN",
"class": "io.vertx.core.eventbus.impl.clustered.ConnectionHolder",
"requestId": "req-PvHlXydYJAhS7Run6wd4",
"message": "Connecting to server d9f36bb4-029f-44d6-8f7c-304be8285b22 failed",
"stacktrace": "io.vertx.core.impl.NoStackTraceThrowable: Not a member of the cluster\n"
}
{
"time": "2024-05-12T10:14:34.810948660Z",
"level": "WARN",
"class": "io.vertx.core.eventbus.impl.clustered.ConnectionHolder",
"requestId": "req-PvHlXydYJAhS7Run6wd4",
"message": "Connecting to server 43e35345-2b5a-4b89-bb59-f5bf67f01780 failed",
"stacktrace": "io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.21.128.0:36519\nCaused by: java.net.ConnectException: Connection refused\n\tat java.base/sun.nio.ch.Net.pollConnect(Native Method)\n\tat java.base/sun.nio.ch.Net.pollConnectNow(Unknown Source)\n\tat java.base/sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)\n\tat io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Unknown Source)\n"
}
We cannot reproduce the issue consistently, it only happens sometimes.
In Vert.x core the exception occurs in io.vertx.core.eventbus.impl.clustered.ConnectionHolder in the following peace of code:
It seems that the piece of code in vertx hazelcast the fails is in the class io.vertx.spi.cluster.hazelcast.HazelcastClusterManager:
@Override
public void getNodeInfo(String nodeId, Promise<NodeInfo> promise) {
vertx.<NodeInfo>executeBlocking(() -> {
HazelcastNodeInfo value = nodeInfoMap.get(nodeId);
if (value != null) {
return value.unwrap();
} else {
throw new VertxException("Not a member of the cluster", true);
}
}, false).onComplete(promise);
}
Any ideas on what might be going on?
The text was updated successfully, but these errors were encountered:
Questions
It seems that Vert.x caches non-existing addresses (previously shutdown) when using clustered event bus. Some eventbus send calls cause timeout exception because the listener on the address does not exist.
Version
Vert.x 4.5.5
hazelcast-kubernetes 3.2.3
Context
We are using clustered Vert.x with hazelcast kubernetes discovery. We have multiple kubernetes pods and each contains one Verticle.
If multiple nodes are killed for any reason, when they are started again, the cluster is established, but Vert.X doesn't seem to be aware of that. It looks like Vert.x is trying to send event bus messages to addresses that no longer exist in the cluster.
The only way we are able to solve this problem is to shut down all Verticles and then start them again. If at least one Verticles stays up, it propagates old addresses to every other new one.
Potentially relevant log statements
We cannot reproduce the issue consistently, it only happens sometimes.
In Vert.x core the exception occurs in
io.vertx.core.eventbus.impl.clustered.ConnectionHolder
in the following peace of code:It seems that the piece of code in vertx hazelcast the fails is in the class
io.vertx.spi.cluster.hazelcast.HazelcastClusterManager
:Any ideas on what might be going on?
The text was updated successfully, but these errors were encountered: