-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token aware load balancing for batches #448
Comments
Oh, I thought we already speculatively do that (take the first prepared statement if available and use its token). I think we should even consider doing that unconditionally, since it's good practice to only batch statements within a single partition - in which case the user will be rewarded for their good practice with token awareness for free. Then we don't even need any Contributions welcome! I'll take liberty of assigning you to this one, so everyone knows it's taken. Thanks! |
As a new user, I don't understand how speculative isn't unconditional. Could you elaborate? Or do you mean we actually currently don't already do that? Thanks, |
Now that I read it, I think I misused the word "speculatively", "optimistically" would have been better. I meant the decision to be unconditional - we optimistically assume that the batch only consists of a single partition, so we just extract token info from the first statement. |
So just to be sure I understand correctly, this is not currently implemented but is the plan? |
Indeed, looking at the code, it's not implemented. |
This documentation seems to say that ideal performance would be attained by grouping together queries that end up in the same partition. However it looks like since:
is not public (#468) that may actually be impossible right now. It looks then like as part of implementing this feature, this should also be made public. |
The new code, and in particualr the fancy GAT workaroound, cause problems when trying to pass batch values by reference, as described in scylladb#568. Fixes: scylladb#568 Unfixes: scylladb#448 This reverts commit 24ee954. Signed-off-by: Jan Ciolek <[email protected]>
Reopening because of #569 |
Helps (or arguably fixes) scylladb#468 For smarter batch constitution (following up on scylladb#448), it is required to have access to the token a particular statement would be directed to. However, that is pretty difficult to do without access to calculate_token. That was initially put in scylladb#508, but planned to put in a separate pr to keep it minimal (scylladb#508 (comment)) It seems I had forgotten to open that separate PR, and I end up getting bitten by that now that I want to constitute my batches in a smarter way. So here it is. I'm only putting "helps" above because I think we may want to also expose query planning ( `session.plan(prepared_statement, serialized_values) -> impl Iterator<Item = (Arc<Node>, ShardID)>` ) as that may make it significantly easier - but I'd like to keep this PR that just *enables* the ideal behavior as simple as possible.
Currently, batches don't support token aware load balancing. However, there could be situations where batches statement are guaranteed to be processed by the same nodes. Enabling token aware load balancing would then improve performance.
A possible implementation could be to add an additional configuration method, for example
set_compute_token
, and when the flag is set, the token of the batch's first prepared statement is computed to to be used in load balancing policy.Any opinion about it? I'm volunteer to make the PR.
The text was updated successfully, but these errors were encountered: