update docstring

grafana · Aug 24, 2024 · f2b1544 · f2b1544
1 parent 5b93008
commit f2b1544
Showing 1 changed file with 37 additions and 53 deletions.
diff --git a/pkg/scheduler/queue/tree_queue_algo_querier_worker_queue_priority.go b/pkg/scheduler/queue/tree_queue_algo_querier_worker_queue_priority.go
@@ -4,74 +4,58 @@ package queue
 
 // QuerierWorkerQueuePriorityAlgo implements QueuingAlgorithm by mapping worker IDs to a queue node to prioritize.
 // Querier-workers' prioritized queue nodes are calculated by the integer workerID % len(nodeOrder).
+// This distribution of workers across query component subtrees ensures that when one query component is experiencing
+// high latency about 25% of querier-workers continue prioritizing queries for unaffected components.
 //
-// Purpose:
-// This algorithm is intended to ensure that querier-workers are balanced across nodes in the queue selection tree.
-// While it can serve to balance consumers across request queue nodes which have been partitioned by any criteria,
-// its original purpose is to balance querier-workers across queues partitioned a query's expected query component`.
-//
-// There are four possible query components: "ingester", "store-gateway", "ingester-and-store-gateway", and "unknown".
-// With all four possible query component queue nodes active, the modulo operation on the worker IDs
-// will distribute approximately 1 / 4 of the querier-workers to each queue node.
-// A querier-worker ID which is mapped to prioritize the "ingester" queue node will always start there
-// and attempt first to work on a query which only requires the ingesters to complete.
-//
-// By splitting the queues by query component we can ensure that the approximately 1 / 4 of querier-workers are
-// "reserved" for a query component even when the other query component is experiencing high latency.
-// This reservation ensures that the querier-worker connections can continue to process queries
-// which do not need to utilize the query component experiencing high latency.
-//
-// Performance:
-// This significantly outperforms the previous round-robin algorithm which simply rotated through the node order
-// (see TestMultiDimensionalQueueAlgorithmSlowConsumerEffects benchmark outputs for comparison).
+// This significantly outperforms the previous round-robin approach which simply rotated through the node order.
 // Although a vanilla round-robin algorithm will select a given query-component node 1 / 4 of the time,
-// in situations of high latency on a query component that one slow query component will still grow asymptotically
+// in situations of high latency on a query component, the one slow query component will still grow asymptotically
 // to dominate the utilization the querier-worker connections, as measured by inflight query processing time.
 //
-// Implementation Details & Assumptions:
-// The MultiAlgorithmTreeQueue which utilizes this and other QueuingAlgorithm implementations always deletes
-// nodes for paths through the tree which lead to an empty leaf node after a dequeue operation.
-// Nodes for paths through the tree are then re-created when a new request is enqueued which requires that path.
-// This means that a tree will not always have all 4 node types for the 4 possible query component assignments.
+// There are four possible query components: "ingester", "store-gateway", "ingester-and-store-gateway", and "unknown".
+// With all four queue nodes active, approximately 1 / 4 of the querier-workers are prioritized to each queue node.
+// Assuming the above query-component node order, a querier-worker ID which is evenly divisible by 4
+// always first attempts to dequeue a query from the "ingester" queue node at index 0 in the nodeOrder.
+//
+// This algorithm requires a minimum of 4 querier-workers per querier to prevent queue starvation.
+// The minimum is enforced in the queriers by overriding -querier.max-concurrent if necessary.
 //
+// It is not required that the number of querier-workers be divisible by four; this algorithm assumes that
+// the MultiQueuingAlgorithmTreeQueue always deletes empty nodes recursively after a dequeue operation.
+// and if a new request is enqueued for a previously-deleted paths, nodes are re-created.
 // This has two implications for the distribution of workers across queue nodes:
 //  1. The modulo operation may modulo the worker ID by 1, 2, 3, or 4 depending on the number of node types
 //     currently present in the node order, which can change which node a worker ID is prioritized for.
-//  2. The node order changes as queues are deleted and re-created, so the worker ID to node mapping will change
-//     as the essentially random enqueue order places query component nodes in different positions in the order.
-//
-// We consider this a desirable property, as it ensures that a number of querier-workers which is not evenly
-// divisible by the number of query component nodes will, through the randomized changes in nodeOrder over time,
-// be distributed more evenly across the nodes than if length and order of the nodeOrder were fixed.
+//  2. The node order changes as queues are deleted and re-created, so the worker ID-to-node mapping changes
+//     as the random enqueue order places query component nodes in different positions in the order.
 //
-// Minimizing Idle Querier-Worker Capacity:
-// We say the queue nodes are "prioritized" for a worker rather than "assigned" to a worker
-// because the same worker ID is mapped to *start* at a certain queue node, but will move on to other nodes
-// if it cannot dequeue a request from any of the child queue nodes of its first prioritized queue node.
-// This can occur when this queue algorithm is placed at the highest layer of the tree and
-// the tenant-querier-shuffle-shard queue algorithm is placed at the second, leaf layer of the tree.
-// Ex:
+// These changes in nodeOrder guarantee that when the number of querier-workers is not evenly divisible
+// by the number of query component nodes, through the randomized changes in node order over time, the workers
+// are more evenly distributed across query component nodes than if length and order of the nodes were fixed.
 //
-//  1. The QuerierWorkerQueuePriorityAlgo begins with a nodeOrder of:
-//     ["ingester", "store-gateway", "ingester-and-store-gateway", "unknown"].
+// A given worker ID is prioritized to *start* at a given queue node, but is not assigned strictly to that node.
+// During any period without change to the nodeOrder, the same worker ID consistently starts at the same queue node,
+// but moves on to other nodes if it cannot dequeue a request from the subtree of its first prioritized queue node.
+// Continuing to search through other query-component nodes and their subtrees minimizes idle querier-worker capacity.
 //
-//  2. A querier-worker with workerID 0 requests to dequeue and is mapped to start with the "ingester" queue node.
+// A querier-worker can process queries for nodes it has not prioritized when this queue algorithm is applied at the
+// highest layer of the tree and the tenant-querier-shuffle-shard queue algorithm applied at the second layer of the
+// tree. If shuffle-sharding is enabled, a querier-worker that prioritizes ingester-only queries may not find
+// ingester-only queries for any tenant it is assigned to, and move on to the next query component subtree. E.g.:
 //
-//  3. The tree traversal algorithm recurs down to select child queue nodes of the "ingester" node,
-//     where each child queue node is a non-empty tenant-specific queue of ingester-only queries.
-//     The tenant-querier-shuffle-shard queue algorithm checks each tenant node for if it is sharded to this querier.
-//
-//  4. (a) The first tenant queue node found which is sharded to this querier will be dequeued from, and we are done.
+//  1. The QuerierWorkerQueuePriorityAlgo has node order:
+//     ["ingester", "store-gateway", "ingester-and-store-gateway", "unknown"].
 //
-//  4. (b) Otherwise, if none of those tenants are sharded to this querier, the tree traversal algorithm will return
-//     back up to the parent level and ask the QuerierWorkerQueuePriorityAlgo to select its next node.
-//     We continue to step 5.
+//  2. A querier-worker with workerID 0 requests to dequeue; it prioritizes the "ingester" queue node.
 //
-//  5. The QuerierWorkerQueuePriorityAlgo will select the next node in the nodeOrder, "store-gateway".
-//     We return to step 3 and continue through steps 3, 4b, and 5, until we reach step 4a and exit.
+//  3. The dequeue operation recursively dequeues from "ingester" node. Each child node is a tenant-specific
+//     queue of ingester-only queries. The tenantQuerierAssignments QueuingAlgorithm checks if any of its child nodes
+//     (tenant queues) is assigned to this querier, and finds none.
 //
-// This process of continuing to search for requests to dequeue helps prevent querier-worker capacity from sitting idle
-// when there are no requests to dequeue for the query component node that the querier-worker was originally mapped to.
+//  4. We walk back up to the QuerierWorkerQueuePriorityAlgo level, not having dequeued anything. The
+//     QuerierWorkerQueuePriorityAlgo increments currentNodeOrderIndex and selects the next node in nodeOrder (in
+//     this example, "store-gateway"), and checks for dequeue-able queries again, from step 3, etc. until a
+//     dequeue-able child is found, or every query component node has been checked for dequeue-able queries.
 type QuerierWorkerQueuePriorityAlgo struct {
 	currentQuerierWorker  int
 	currentNodeOrderIndex int