-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valkey Proxy Proposal #11
base: main
Are you sure you want to change the base?
Conversation
67d1a2a
to
7b148e5
Compare
## Abstract (Required) | ||
|
||
The proposed ValkeyProxy is to provide the features similar to redis-cluster-proxy. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed ValkeyProxy is to accept connections from clients and multiplexes them onto connections to the backend server processes. It is horizontally scalable and multi-tenant.
It serves several main purposes beyond simply routing requests to the right processes.
It provides scalability by offloading connection overhead from the backend server processes. It also could hold client connections open even as the server processes are being migrated or restarted. this is very helpful in cloud-native situations, our control over various open source sdks behavior can be extremely limited. For scenarios with lager data volumes, we could provide read-write separation mode through the proxy.
what's more, we could add stateless, secure HTTP connections directly, enhanced connection observability and so on.
ValkeyProxy.md
Outdated
|
||
So if the client is only connected to the Proxy instead of server node, then as long as the client is not connected inSentinel mode, then when the client sends a command, it does not need to pay too much attention to the underlying connection mode. The Proxy will handle the execution of different modes for the client. | ||
|
||
3. Support multi-database mode even server runs as cluster mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider it is an optional feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the multi-database mode could be in server instead of in proxy
For some time-consuming or high CPU cost commands, we can make statistics in Proxy side, and for some results, such as min/max/avg/p95/p99 from memtier_benchmark or valkey_benchmark, clients can get timely statistical information. | ||
|
||
|
||
6. Support audit log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this is a hook in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't we implement audit logs like this into the server though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On benefit implemented in proxy side is to retrieve this information from one point, user do not need to fetch the log in different nodes in the cluster mode.
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
Signed-off-by: hwware <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly dubious of proxies in Valkey as a general solution. They do solve problems at scale related to number of connections, but I think many of the problems can also be solved by improving clients.
There are some cool things that could be solved though, like implementing meta-commands (like distributed locks, sorted sets, etc) which is also very interesting. This isn't exactly the same as your proposal, since you seem to be proposing almost API compatibility with the core.
|
||
## Design Considerations (Required) | ||
|
||
1. Support all existing Valkey commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to bring up that some commands are intrinsically hard to support like lua scripts and pub/sub. Do we need to necessarily implement all commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even not all, but most is necessary, IMO
|
||
1. Support all existing Valkey commands. | ||
|
||
2. Shield the functional/API differences between cluster mode and primary-replica mode, so that clients do not need to care about the underlying operating mode when using it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Shield the functional/API differences between cluster mode and primary-replica mode, so that clients do not need to care about the underlying operating mode when using it. | |
2. Shield the functional/API differences between cluster mode and standalone mode, so that clients do not need to care about the underlying operating mode when using it. |
I think we've decided to just call this standalone, since this is what the server emits in info and hello.
|
||
So if the client is only connected to the Proxy instead of server node, then as long as the client is not connected inSentinel mode, then when the client sends a command, it does not need to pay too much attention to the underlying connection mode. The Proxy will handle the execution of different modes for the client. | ||
|
||
3. Support multi-database mode even server runs as cluster mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we implement this as a server side feature instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can have it in the server side.
|
||
# ValkeyProxy RFC | ||
|
||
## Abstract (Required) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things I'm generally unsure of is whether or not we really a proxy compared to just making clients better. It's sort of been the project of glide to abstract away as much of the differences you have outlined here.
As an aside, we might be able to use the core of glide to implement some of this logic.
For some time-consuming or high CPU cost commands, we can make statistics in Proxy side, and for some results, such as min/max/avg/p95/p99 from memtier_benchmark or valkey_benchmark, clients can get timely statistical information. | ||
|
||
|
||
6. Support audit log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't we implement audit logs like this into the server though?
8. Provide a connection pool to facilitate client connection. | ||
|
||
Connection pooling can reduce the overhead associated with opening and closing connections and with keeping many | ||
connections open simultaneously. This overhead not only includes memory needed to handle each new connection, also involves | ||
CPU overhead to close each connection and open a new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think connection pooling is done fairly effectively on the clients today. The only real drawback I've seen is when we have large clusters, so each client has to maintain a very large number of outbound connections
|
||
For the client's persistent connection, the connection will not actively disconnect, but only returns the error message; If the client is disconnected actively, the connection will be disconnected immediately. | ||
|
||
9. Support CLIENT LIST etc commands aggregate return: return the client of all proxy nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aggregate commands are cool, we've also heard folks interested in running flushall across the whole cluster. Configuration like config set
might also be possible.
7. Provide reading/writing separation features | ||
|
||
In production environments, most scenarios involve more reading requests and less writing requests. By enabling | ||
read-write separation feature in the Proxy, we can let the primary node focus on processing write operations and let the | ||
replica handle read operations. It is very suitable for scenarios with relatively large read operations and can reduce the | ||
pressure on the primary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, this can go horribly wrong. We talked with a customer that was doing caching and did read-write splits, they were having cache misses on replicas but the writes were going to the primary, but the replica was slow at consuming so it would miss again and trigger another write. I think generally read/write splitting is a big anti-pattern for online transactional workloads like caching or message queues. it makes more sense for analytic workloads, but I don't Valkey really serves much analytics today. Maybe that will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we want to implement this feature in ValkeyProxy is:
- This feature (read-write separation) could be enabled or disabled through config parameter. And by integrating with ACL, this feature only applies to some specific users
- For those data with a relatively low update frequency, read misses due to differences in primary and replica data, which in turn lead to multiple writes, are acceptable in some scenarios.
- If data consistency guarantees are a must following the reason 2 case, synchronous between the master and replica can be considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From weekly meeting @PingXie, should we support cross-slot operations (including cross shard operations) . Will we support that functionality and what type of guarantee will provide with that.
No description provided.