-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong node roles after reconnecting to Redis #2581
Comments
Do you have a reproducer handy that we can use to reproduce the issue without having to set up a ton of infrastructure? Likely, that is a Lettuce driver issue. |
Unfortunately, I don't have a tool to reproduce the problem, but I can provide sample code. (Kotlin, Spring Data Redis) I have 3 replicas and 3 sentinels for each on the same hosts. In other words host1 with two ports, one for redis, another for sentinel and etc. (example, host1:6379,host1:26379; host2:6379,host2:26379; host3:6379,host3:26379)
|
My question is... If you were to code a simple Java application without Spring (Boot/Data Redis), that is, use only Java and Lettuce with Redis, would you have the same problem?
Given only my cursory knowledge on Spring Data Redis (yet) as well as my limited knowledge and experience with Redis in various capacities/contexts (e.g. Redis Sentinel with Docker), it seems to me, very little logic exits in Spring Data Redis for managing topology (role) changes of Redis Sentinel nodes where Master/Replicas are concerned. For instance, if you are referring to the In addition, the Even Therefore, outside of being able to pass configuration down from Spring, or any Java client for that matter, to (ultimately) the (Lettuce) Redis (client) driver when the Upon closer inspection of the Lettuce driver in particular, it seems that maybe you need a dynamic topology change detection mechanism (rather than "static"), as described here. Although, maybe Docker much like AWS needs an explicit list of nodes (i.e. The Lettuce documentation on the
In particular: "The connection needs to be re-established outside of Lettuce in case of a Master/Replica failover or topology changes." However, the documentation is not very clear on "roles" in any case, and in particular, the "static" one. The best thing I can think of at the moment is to restablish your connection(s) by closing the existing connections and re-opening them. Worse case scenario, this might even require the application to be restarted. There is very little (automated or otherwise) Spring Data Redis can do other than what is possible and allowed by the underlying (Lettuce) Redis (client) driver API. And, after browsing through the API, I am coming up a bit short here. |
Also from the Lettuce driver documentation:
|
Also thinking if "strong" consistency is not required in your application use case and requirements (SLA) then what would be the harm in writing to any available node (not the "master"), if that is even possible to do with Redis Sentinel. Also, given the Redis provided documentation on Redis Sentinel, it seems to me that "Sentinels" are responsible for notifying clients for topology changes, such as the new master (a "role" (??) in the HA topology). |
I was also just reading, from "High availability with Redis Sentinel" documentation, in the section, "Fundamental things to know about Sentinel before deploying", # 6, that:
Unfortunately, the information is less than complete with respect to "roles". You don't seem to be having problems with "connections" from the client (even after the Docker Containers hosting the nodes in the Redis Sentinel come back online), only that the "roles" of the "listed" (static) nodes is stale. In fact, there is no mention of "roles" in the Redis documentation at all. :( |
I think yes, because MasterReplicaConnectionProvider.knownNodes() are the part of lettuce-core |
@mp911de I have all the necessary infrastructure set up and I can change or tweak something in the starter or in the driver, if it helps you |
Yes, I use SentinelToployRefresh and when master is changed after failover I can see that application update topology. Among them, I see the current master and SentinelToployRefresh ONLY works if there is no disconnect. That is, I can send the debug sleep {timeout} command for the current master in docker container and SentinelToployRefresh will indeed work, but if all containers stop, for example, in the case of a disconnect, the topology is not updated. After disconnecting, there are no events from the SentinelConnector and the master is out of date. It turns out that sentinel does not discard events during disconnect and the topology does not change in application When all three containers are restarted, the application sees them again but with the old roles (MasterReplicaConnectionProvider.knownNodes()) NOTICE: I restart the entire infrastructure (stop docker containers, put certificates, start containers) and only then I see this error when trying to write to the old master. And I use Ansible for restart all infrastructure @mp911de please, notice to the above |
Sentinel does not throws events on stop of docker containers with Redis and Sentinel and the topology does not change. |
Can you provide us with a bit of guidance regarding the docker containers for Redis? Do you have an image/multiple images that we can easily spin up to have a similar setup to yours? We would like to understand more about what is happening and therefore we need to have a way to reproduce the issue. |
Yes, but I need time |
I am still not convinced this is a Spring Data Redis problem, but perhaps rather a Redis (client) driver, like Lettuce, problem specifically. |
@jxblum Maybe it's a problem with the Lettuce driver, but I don't know how to check and fix it yet |
j2 template for redis configuration
j2 template for haproxy configuration
And for start, stop, restart we use systemd with follow configurations j2 template for redis configuration
j2 template for redis proxy configuration
j2 template for redis sentinel configuration
|
@mp911de Hello. Is there anything I can do to fix the driver? I have all the necessary infrastructure, I can try to help. I need advice on which part of the code could be the problem |
I don't have even an idea what J2 is. The pasted code bits are overly complex and by no means we are able to reproduce the issue. We're looking for a way to easily be able to reproduce the scenario without us spending hours and days to understand the infrastructure, but rather to diagnose what is going wrong. |
Could you stop all Redis while clients are running? And after the specified failover process time (down-after-milliseconds) will the failover process occur and then to start the Redis? The issue is likely to be reproducible as the client will still have the old master. |
Closing this ticket because it isn't going anywhere and we cannot act on it. |
Hello. After restart docker container with Redis leader changes (if leader was another at the moment) and application does reconnect to Redis but MasterReplicaConnectionProvider contains nodes with old roles (property knownNodes).
Because role was changing after reconnect, MasterReplicaConnectionProvider.getConnectionAsync("WRITE") gives old master and occurs RedisReadOnlyException.
There is SentinelTopologyRefresh which track topology updating but it's doesn't work after reconnecting.
How enforce to update roles for nodes? Application written by Java, Spring Boot
Please, any ideas?
The text was updated successfully, but these errors were encountered: