-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing Shard
ID publicly
#468
Comments
IIUC (but I'm no expert, confirmation welcome) a |
I must have missed this issue before - our driver already computes everything it needs to be shard-aware, so it's capable of assigning tokens to owner shards of a particular node. The code is available in this source file: https://github.com/scylladb/scylla-rust-driver/blob/fd06928929c0dd78a72c7aca494a998da1514a79/scylla/src/routing.rs , so I guess it's only the matter of exposing these details to the end user. Right now they're hidden quite deep in the load balancing policy implementation. |
Helps (or arguably fixes) scylladb#468 For smarter batch constitution (following up on scylladb#448), it is required to have access to the token a particular statement would be directed to. However, that is pretty difficult to do without access to calculate_token. That was initially put in scylladb#508, but planned to put in a separate pr to keep it minimal (scylladb#508 (comment)) It seems I had forgotten to open that separate PR, and I end up getting bitten by that now that I want to constitute my batches in a smarter way. So here it is. I'm only putting "helps" above because I think we may want to also expose query planning ( `session.plan(prepared_statement, serialized_values) -> impl Iterator<Item = (Arc<Node>, ShardID)>` ) as that may make it significantly easier - but I'd like to keep this PR that just *enables* the ideal behavior as simple as possible.
It doesn't seem like this gives access from the prepared statement + serialized values combination. I feel like that would be more practical to use, as that would enable abstracting the keyspace/table/partition_key part (especially partition key knowledge that may otherwise be unknown by application code) when dealing with sharding-related considerations. |
Resolves scylladb#468 This is a follow-up on scylladb#508 and scylladb#658: - To minimize CPU usage related to network operations when inserting a very large number of lines, it is relevant to batch. - To batch in the most efficient manner, these batches have to be shard-aware. Since scylladb#508, `batch` will pick the shard of the first statement to send the query to. However it is left to the user to constitute the batches in such a way that the target shard is the same for all the elements of the batch. - This was made *possible* by scylladb#658, but it was still very boilerplate-ish. I was waiting for scylladb#612 to be merged (amazing work btw! 😃) to implement a more direct and factored API (as that would use it). - This new ~`Session::first_shard_for_statement(self, &PreparedStatement, &SerializedValues) -> Option<(Node, Option<Shard>)>` makes shard-aware batching easy on the users, by providing access to the first node and shard of the query plan.
With Tablets (https://www.scylladb.com/presentations/tablets-rethinking-replication/) there is no longer a mapping between Token and shard (and such mapping was always an implementation detail) because the same partition may be located on different shards of different replicas.
|
The main issue with not batching is that going through the heavy Tokio machinery for every statement puts x5 CPU load on the server that makes the scylla calls using the Rust driver, whereas with batching that load is shared across coordinators (and probably less significant than managing Tokio futures is even if ignoring the fact that work is shared, because Tokio turned out to be heavy in my benchmarks) |
IIUC, your idea to create multi-partition batches performs faster wrt the driver, but imposes higher load on the cluster. Right? So this is a trade-off that we used to reject in the past due to our goal being to minimise load on the Scylla cluster, thus making it maximally performant. |
It looks like at the moment the cluster may have to handle some extra networking and individual handling for every statement that might put sub-optimal load on it as well, which seems hard to control unless either one can send batches, or this can be configured to be longer (ideally into something like But more importantly, if we agree just that the query load is similar or greater on the client side as on the server side if we handle statements individually, for the use-case of performing High-Frequency Append-Log Style Operations like is my case and that of the author of #974, it seems clear that more scylla servers will be required to store & perform maintenance operations than client servers (~1/10 ratio), so it doesn't seem like closing APIs that would allow offload part of the work to the servers is practical, as that would not significantly reduce the number of required scylla servers (they would still be needed for maintenance operations to work properly), but would drastically increase the number of required clients. Bear in mind that with this use-case, currently the alternative is just making batches that are not targeted, and it's not better for the servers load either. |
That's convincing, at least to me.
OK, could you re-iterate what kind of API you need specifically access to in order to get such client-side performance gains? |
It looks like the discussion here is joining with that of #738 so I'm just going to forward to there: #738 (comment) |
It'd be helpful to have greater control over how records are batched. Once scenario my team has discussed is using the
Shard
ID as a means to group data for batching, irrespective of partition.The general thinking is that we can get the relevant
Token
from aPreparedStatement
by computing the partition key and passing it to thePartitioner
. To achieve this, ideallyget_partitioner_name()
would bepub
instead ofpub(crate)
onPreparedStatement
.With the
Token
, get theShard
ID from somewhere that makes sense - no good ideas on this yet, perhaps a method on theClusterData
.App specific logic can then be used to group records by the shard ID and batch them for writing to Scylla.
Rather than expose the shard information, a shard aware batching API would be more useful. I think probably related is #448.
The text was updated successfully, but these errors were encountered: