-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Scan - glide-core and python #1623
Conversation
0d4ee26
to
61933b8
Compare
f703258
to
b253eda
Compare
42581bc
to
dcd6da0
Compare
dcd6da0
to
14693c9
Compare
Hi @avifenesh , I'm working on the Java port of this in #1737. I'm not sure if the memory management is quite right. When the PythonGC runs on a cursor object, we'll call back to Rust to remove the container for the cursor state, correct? However every call to cluster_scan() creates a new ClusterScanCursor object, even if they are continuing the same cursor. Would each of these all point to the same cursor string? If so, you've got duplicate ownership of the same cursor, so if any of them get GC'd before the user is done iterating through all cursors (perhaps the user reuses the same local variable to store cursors), the cursor could get removed prematurely. The specific use case I think we can run into problems in is if the user runs cluster_scan, gets a Rust cursor and puts it in a local variable, runs cluster_scan again, assigns the updated cursor to a local variable, waits for awhile and the gc kicks in, then runs cluster_scan again. If the cursor is shared, it'll get terminated prematurely. Perhaps what we need to do is change the function to return something more like an iterator that the user can close when they are done with:
This has the added benefit of allowing deterministic resource clean-up instead of relying on the garbage collector. The design above would work both in the case where the same cursor string is used by Rust or if it changes from call-to-call. It'd be the same API from the caller's perspective and the cursor-management mechanics would just be an implementation detail. What do you think? |
string cursor = 1; | ||
optional string match_pattern = 2; | ||
optional int64 count = 3; | ||
optional string object_type = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit problematic. Enum in protobuf is numbers, enum in python need to choose between string to enumerate (as far as i was able when tried)
I wanted to use the same enum in python for both standalone which has to strings, and both cluster mode, so i couldn't use an enumerate enum in python.
The first try was enum as you suggest but i then i tackle this issue.
Do you have any other solution?
@avifenesh @barshaul , I've updated my PR at #1737 to implement the design proposed above. I've ported one of your integration tests to Java using this design as well to demonstrate how this works from a caller perspective. With this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some last comments.
🚀🚀🚀
98fe470
to
ffb967e
Compare
ffb967e
to
1bad86b
Compare
Scan command for Glide-Core and Py
Scan command API impl in glide-core and command impl in python.
The PR contain the general implementation of the core communication with the command interface in redis-rs, translating the protobuf args into the desired args for the cluster scan command in redis-rs, which is a different API than the CMD interface and need different handling.
Contain the implementation of the container of the cursors ref, and the functionality of adding and removing the refs from the container on order to drop the object held by redis-rs.
On the python side, the PR contain the wrapper implementation which include wrapping the cursor id in a special object that implement the get and remove function, the remove function is part of del function of the object and will be triggered when the object is dropped by the user and the GC of the language clean it, which trigger del of the object.
It contain the implementation of SCAN for CME and SCAN for standalone.
Please notice that the commands are completely different, and performing completely different logic.
Profiled memory consumption differences between cluster scan to standalone scan with kinda big amount of keys, sets, hash's, and zsets - 10000 each, hence a lot of ScanState objects creation and storing in the CME SCAN.
For both added arbitrary sleeps, in order to create scenarios which the ScanState is held for a while. The differences are about 11%, which is exactly the same differences when setting the keys, hash's, sets and zset's, hence the differences are probably coming from the different type of servers and the work done when working in cluster, while the usage of the cluster scan logic and the ScanState object doesnt affect memory.
The PR is based on the work done in redis-rs, and the API provided by it. This PR can be find in the link below.
redis-rs impl of Cluster Scan
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.