-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move output IO throttler to IO queue level #2332
base: master
Are you sure you want to change the base?
Move output IO throttler to IO queue level #2332
Conversation
Current tests on fair queue try to make the queue submit requests in extremely controllable way -- one-by-one. However, the fair queue nowadays is driven by rated token bucket and is very sensitive to time and durations. It's better to teach the test accept the fact that it cannot control fair-queue requests submissions on per-request granularity and tunes its accounting instead. The change affects two places. Main loop. Before the change it called fair_queue::dispatch_requests() as many times are the number of requests test case wants to pass, then performed the necessary checks. Now, the method is called infinitely, and the handling only processes the requested amount of requests. The rest is ignored. Drain. Before the change it called dispatch_requests() in a loop until it returned anything. Now it's called in a loop until fair queue explicitly reports that it's empty. Signed-off-by: Pavel Emelyanov <[email protected]>
The class in question only controls the output flow of capacities, it's not about fair queueing at all. There is an effort to make cross-shard fairness, that needs fair_group however, but we're not yet there. refs: scylladb#2294 Signed-off-by: Pavel Emelyanov <[email protected]>
The io_group carries throttle instances for each of the streams, but is still named as _f[air]g[roup]s. Effectively, this is the continuation of the previous patch. Signed-off-by: Pavel Emelyanov <[email protected]>
The move is very smoth, no changes in code needed but the definition of throttle::capacity_t. It used to be the same as fair_queue_entry::capacity_t, since we use the same notion of capacity for both -- fair queuing and capacity throttling. Signed-off-by: Pavel Emelyanov <[email protected]>
The capacity grabbing method now accepts fq entry referece, but this code is going to be moved to throttle.(cc|hh) which doesn't work with fq entries. Signed-off-by: Pavel Emelyanov <[email protected]>
Shard-local fair_queue class makes some shard local bookkeeping of throttled capacities. Shared bookkeeping was done by fair_group which was renamed and moved away, now it's time to collect shard-local throttle facilities. Signed-off-by: Pavel Emelyanov <[email protected]>
No function changes, just move the code. Signed-off-by: Pavel Emelyanov <[email protected]>
IO queue uses the concept of "stream" to handle duplex devices. The stream is now a fair_queue, but for future patching it's needed to put more onto stream, so this patch wraps the on-io-queue fair_queues into helper struct stream. Signed-off-by: Pavel Emelyanov <[email protected]>
The fair queue code doesn't "know" which entries it schedules. For that there's an "abstract" fair_queue_entry that are scheduled and to dispatch those the caller must provide std::function<> callback to the fair_queue::dispatch_requests() method. Next patches will need to make fair_queue call more code on those entries. Not to introduce another std::function<> callback, change the dispatch indirection from callback to pure virtual .dispatch() method of the fair_queue_entry. Signed-off-by: Pavel Emelyanov <[email protected]>
When fair_queue tries to dispatch a request it tries to grab the capacity from throttler on its own. That's wrong place, fair group is just about providing cross-classes fairness of requests. Throttling output stream of requests should be done by io_queue itself. Signed-off-by: Pavel Emelyanov <[email protected]>
Those notifications were used to update output throttler with the result of requests dispatching/cancellation/etc. Now when the throttler is on the io_queue level, fair queue can be freed from this duty. Signed-off-by: Pavel Emelyanov <[email protected]>
The method is empty now. Signed-off-by: Pavel Emelyanov <[email protected]>
@@ -297,6 +298,10 @@ class queued_io_request final : private internal::io_request, public fair_queue_ | |||
queued_io_request(queued_io_request&&) = delete; | |||
~queued_io_request() = default; | |||
|
|||
virtual throttle::grab_result can_dispatch() const noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the compiler on CI complains that it does not use override
keyword.
io_queue.cc:301:35: error: 'can_dispatch' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
301 | virtual throttle::grab_result can_dispatch() const noexcept {
src/core/fair_queue.cc
Outdated
void fair_queue::dispatch_requests() { | ||
capacity_t dispatched = 0; | ||
boost::container::small_vector<priority_class_ptr, 2> preempt; | ||
|
||
while (!_handles.empty() && (dispatched < _throttle.per_tick_grab_threshold())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: dispatched
variable becomes unused after the removal. The compiler on CI complains about it:
fair_queue.cc:238:16: error: variable 'dispatched' set but not used [-Werror,-Wunused-but-set-variable]
238 | capacity_t dispatched = 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure why the compiler raises error -- below there is dispatched += req_cap;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this variable becomes write-only, which effectively means unused. This place is obsoleted with #2294 anyway :(
@@ -602,17 +602,17 @@ shared_throttle::config io_group::make_throttle_config(const io_queue::config& q | |||
} | |||
|
|||
std::chrono::duration<double> io_group::io_latency_goal() const noexcept { | |||
return _fgs.front().rate_limit_duration(); | |||
return _throttle.front().rate_limit_duration(); | |||
} | |||
|
|||
io_group::io_group(io_queue::config io_cfg, unsigned nr_queues) | |||
: _config(std::move(io_cfg)) | |||
, _allocated_on(this_shard_id()) | |||
{ | |||
auto fg_cfg = make_throttle_config(_config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: maybe we could also adjust the variable name to throttle_cfg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
src/core/fair_queue.cc
Outdated
@@ -161,7 +161,7 @@ bool fair_queue::class_compare::operator() (const priority_class_ptr& lhs, const | |||
return lhs->_accumulated > rhs->_accumulated; | |||
} | |||
|
|||
fair_queue::fair_queue(fair_group& group, config cfg) | |||
fair_queue::fair_queue(shared_throttle& group, config cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: in fair_queue.hh
the names of parameters of the constructor do not match the ones that are used here. In the header there is fair_queue(shared_throttle& shared, config cfg)
.
Do we want to keep both places consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will need to update header name too
/// This is a fair group. It's attached by one or mode fair queues. On machines having the | ||
/// big* amount of shards, queues use the group to borrow/lend the needed capacity for | ||
/// requests dispatching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: just a short question about the comment with description. In previous patches we renamed fair_group
--> shared_throttle
. Are any changes needed in the case of the description? It uses the old nomenclature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. I fixed this comment in one of the rebases and lost it along the way
, _per_tick_threshold(_token_bucket.limit() / nr_queues) | ||
{ | ||
if (tokens_capacity(cfg.min_tokens) > _token_bucket.threshold()) { | ||
throw std::runtime_error("Fair-group replenisher limit is lower than threshold"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: we renamed fair_group
to shared_throttle
. Do we want to update also this error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, good catch
#ifdef SEASTAR_MODULE | ||
module; | ||
#endif | ||
|
||
#include <seastar/core/internal/throttle.hh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general question related to the build with modules -- as far as I understand ./src/seastar.cc
includes all headers to provide the declarations and definitions when we build the code with modules.
The source files (e.g. fair_queue.cc
, io_queue.cc
and reactor.cc
) use ifndef e.g.:
#ifdef SEASTAR_MODULE
module seastar;
#else
// include the headers related to seastar
#endif
The comment from seastar.cc:
// include all declaration and definitions to be exported in the the module
// purview
My question is as follows -- do we want to follow the same approach in throttle.cc
? Also, is there an explicit rule in seastar project related to inclusion of files and SEASTAR_MODULE
macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there's any set of rules on how to manipulate SEASTAR_MODULE and related. @tchaikov works on modules support, he can shed more light on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwrobelse hi Patryk, thank you for bringing this up.
seastar.cc
includes both the global module fragment and the purview of the seastar
module. we put the implementation into the module implementation units which starts with module seastar
, for instance, see fair_queue.cc
as you pointed out.
when it comes to throttle.cc
, it needs to access the seastar module, so i think we would need to start it with module seastar;
instead of #include <seastar/core/internal/throttle.hh>
when C++20 modules is enabled, which is checked using #ifdef SEASTAR_MODULE
in general.
I'm not sure if there's any set of rules on how to manipulate SEASTAR_MODULE and related.
@xemul good point. i will note this down. could you suggest which document i should put this in? as its target audience will be library developer, instead of library user, the closest one i can find is coding-style.md
. but this document focuses on coding style, it's more about taste, instead of the guideline on the general design or a perspective of the code structure. shall i create another document on C++20 modules?
@@ -103,10 +92,9 @@ future<> perf_fair_queue::test(bool loc) { | |||
|
|||
auto invokers = local_fq.invoke_on_all([loc] (local_fq_and_class& local) { | |||
return parallel_for_each(boost::irange(0u, requests_to_dispatch), [&local, loc] (unsigned dummy) { | |||
auto cap = local.queue(loc).tokens_capacity(double(1) / std::numeric_limits<int>::max() + double(1) / std::numeric_limits<int>::max()); | |||
auto req = std::make_unique<local_fq_entry>(cap, [&local, loc, cap] { | |||
auto req = std::make_unique<local_fq_entry>(100, [&local, loc] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double-check -- this patch replaces the calculation with hard-coded value of 100. Was the value the same before? Does it matter for the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that due to returning throttle::grab_result::grabbed
from can_dispatch()
the capacity does not matter to the test? Because we always can dispatch (because that's how the test is now) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this capacity can be any, since can_dispatch() always says "yes you can"
@@ -101,7 +90,6 @@ class test_env { | |||
unsigned tick(unsigned n = 0) { | |||
unsigned processed = 0; | |||
while (true) { | |||
_fg.replenish_capacity(_fg.replenished_ts() + std::chrono::microseconds(1)); | |||
_fq.dispatch_requests(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that after defining:
virtual throttle::grab_result can_dispatch() const noexcept override {
return throttle::grab_result::grabbed;
}
in request
class, then this line will dispatch all request that were queued before? When reading the code of dispatch_requests()
I had the impression, that it is the case.
If so, are we leaving the code as it was (mainly while(true)
part) just in case if anybody wanted to define another request type that do not allow to be dispatched immediately? Is this while(true)
needed (of course if I did not misunderstand the code) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect, that this is the root cause of the failure of the test.
Let's take test_fair_queue_longer_run_different_shares
as an example.
We firstly queue a lot of ops:
for (int i = 0; i < 20000; ++i) {
env.do_op(a, 1);
env.do_op(b, 0);
}
And then we want to let them in over time:
for (int i = 0; i < 1000; ++i) {
sleep(1ms).get();
env.tick(3);
}
However, because all requests can be dispatched (new implementation of can_dispatch()
) and the first call to tick(3)
invokes dispatch_requests()
that submits all requests at once, then the second call to tick(3)
is stuck in infinite loop.
When I added a debug print in the loop that calls tick(3)
, then I saw:
23: Call for i = 0
23: Call for i = 1
<reactor stall errors>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails because there's no (easy) way to tell dispatch loop to stop dispatching. So I marked this PR as draft to fix it later
@@ -551,7 +555,6 @@ void | |||
io_queue::complete_request(io_desc_read_write& desc) noexcept { | |||
_requests_executing--; | |||
_requests_completed++; | |||
_streams[desc.stream()].notify_request_finished(desc.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems, that the function does not need to take a parameter -- it is just responsible for accounting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True
: fq(std::move(cfg)) | ||
, thr(st) | ||
{ } | ||
|
||
class io_desc_read_write final : public io_completion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the removal of the need to notify about completion, this class does not need to keep _stream
and fq_capacity
members.
capacity_t dispatched = 0; | ||
boost::container::small_vector<priority_class_ptr, 2> preempt; | ||
|
||
while (!_handles.empty() && (dispatched < _group.per_tick_grab_threshold())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed here the check related to per_tick_grab_threshold()
-- when I search for that function call in this PR then we did not introduce its usage anywhere.
Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This per-tick threshold is completely removed with #2294
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled the change to my local repository and fair_queue_test.cc
fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, fails for me too
There's one interesting even that is generated by fair-queue -- when the queue starts waiting for shared capacity. In fact, the output throttler code should exist on IO-queue level (see scylladb#2332), but currently it's in the fair-queue, so it will need to generate this event. This patch adds tracer reference on fair-queue for future patching. Signed-off-by: Pavel Emelyanov <[email protected]>
There's one interesting even that is generated by fair-queue -- when the queue starts waiting for shared capacity. In fact, the output throttler code should exist on IO-queue level (see scylladb#2332), but currently it's in the fair-queue, so it will need to generate this event. This patch adds tracer reference on fair-queue for future patching. Signed-off-by: Pavel Emelyanov <[email protected]>
IO queue is responsible for two things:
The former is done with the help of fair_queue that implements rather simple "virtual capacity" model. The latter is done with the help of shared token bucket, but historically the throttling code was implemented as a part of fair_queue. That's not correct, fair queue has nothing to do with output flow control. This PR moves the throttling code out of fair_queue and makes IO-queue own it and use.