Skip to content

Commit

Permalink
update docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
francoposa committed Aug 24, 2024
1 parent 5b93008 commit f2b1544
Showing 1 changed file with 37 additions and 53 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,74 +4,58 @@ package queue

// QuerierWorkerQueuePriorityAlgo implements QueuingAlgorithm by mapping worker IDs to a queue node to prioritize.
// Querier-workers' prioritized queue nodes are calculated by the integer workerID % len(nodeOrder).
// This distribution of workers across query component subtrees ensures that when one query component is experiencing
// high latency about 25% of querier-workers continue prioritizing queries for unaffected components.
//
// Purpose:
// This algorithm is intended to ensure that querier-workers are balanced across nodes in the queue selection tree.
// While it can serve to balance consumers across request queue nodes which have been partitioned by any criteria,
// its original purpose is to balance querier-workers across queues partitioned a query's expected query component`.
//
// There are four possible query components: "ingester", "store-gateway", "ingester-and-store-gateway", and "unknown".
// With all four possible query component queue nodes active, the modulo operation on the worker IDs
// will distribute approximately 1 / 4 of the querier-workers to each queue node.
// A querier-worker ID which is mapped to prioritize the "ingester" queue node will always start there
// and attempt first to work on a query which only requires the ingesters to complete.
//
// By splitting the queues by query component we can ensure that the approximately 1 / 4 of querier-workers are
// "reserved" for a query component even when the other query component is experiencing high latency.
// This reservation ensures that the querier-worker connections can continue to process queries
// which do not need to utilize the query component experiencing high latency.
//
// Performance:
// This significantly outperforms the previous round-robin algorithm which simply rotated through the node order
// (see TestMultiDimensionalQueueAlgorithmSlowConsumerEffects benchmark outputs for comparison).
// This significantly outperforms the previous round-robin approach which simply rotated through the node order.
// Although a vanilla round-robin algorithm will select a given query-component node 1 / 4 of the time,
// in situations of high latency on a query component that one slow query component will still grow asymptotically
// in situations of high latency on a query component, the one slow query component will still grow asymptotically
// to dominate the utilization the querier-worker connections, as measured by inflight query processing time.
//
// Implementation Details & Assumptions:
// The MultiAlgorithmTreeQueue which utilizes this and other QueuingAlgorithm implementations always deletes
// nodes for paths through the tree which lead to an empty leaf node after a dequeue operation.
// Nodes for paths through the tree are then re-created when a new request is enqueued which requires that path.
// This means that a tree will not always have all 4 node types for the 4 possible query component assignments.
// There are four possible query components: "ingester", "store-gateway", "ingester-and-store-gateway", and "unknown".
// With all four queue nodes active, approximately 1 / 4 of the querier-workers are prioritized to each queue node.
// Assuming the above query-component node order, a querier-worker ID which is evenly divisible by 4
// always first attempts to dequeue a query from the "ingester" queue node at index 0 in the nodeOrder.
//
// This algorithm requires a minimum of 4 querier-workers per querier to prevent queue starvation.
// The minimum is enforced in the queriers by overriding -querier.max-concurrent if necessary.
//
// It is not required that the number of querier-workers be divisible by four; this algorithm assumes that
// the MultiQueuingAlgorithmTreeQueue always deletes empty nodes recursively after a dequeue operation.
// and if a new request is enqueued for a previously-deleted paths, nodes are re-created.
// This has two implications for the distribution of workers across queue nodes:
// 1. The modulo operation may modulo the worker ID by 1, 2, 3, or 4 depending on the number of node types
// currently present in the node order, which can change which node a worker ID is prioritized for.
// 2. The node order changes as queues are deleted and re-created, so the worker ID to node mapping will change
// as the essentially random enqueue order places query component nodes in different positions in the order.
//
// We consider this a desirable property, as it ensures that a number of querier-workers which is not evenly
// divisible by the number of query component nodes will, through the randomized changes in nodeOrder over time,
// be distributed more evenly across the nodes than if length and order of the nodeOrder were fixed.
// 2. The node order changes as queues are deleted and re-created, so the worker ID-to-node mapping changes
// as the random enqueue order places query component nodes in different positions in the order.
//
// Minimizing Idle Querier-Worker Capacity:
// We say the queue nodes are "prioritized" for a worker rather than "assigned" to a worker
// because the same worker ID is mapped to *start* at a certain queue node, but will move on to other nodes
// if it cannot dequeue a request from any of the child queue nodes of its first prioritized queue node.
// This can occur when this queue algorithm is placed at the highest layer of the tree and
// the tenant-querier-shuffle-shard queue algorithm is placed at the second, leaf layer of the tree.
// Ex:
// These changes in nodeOrder guarantee that when the number of querier-workers is not evenly divisible
// by the number of query component nodes, through the randomized changes in node order over time, the workers
// are more evenly distributed across query component nodes than if length and order of the nodes were fixed.
//
// 1. The QuerierWorkerQueuePriorityAlgo begins with a nodeOrder of:
// ["ingester", "store-gateway", "ingester-and-store-gateway", "unknown"].
// A given worker ID is prioritized to *start* at a given queue node, but is not assigned strictly to that node.
// During any period without change to the nodeOrder, the same worker ID consistently starts at the same queue node,
// but moves on to other nodes if it cannot dequeue a request from the subtree of its first prioritized queue node.
// Continuing to search through other query-component nodes and their subtrees minimizes idle querier-worker capacity.
//
// 2. A querier-worker with workerID 0 requests to dequeue and is mapped to start with the "ingester" queue node.
// A querier-worker can process queries for nodes it has not prioritized when this queue algorithm is applied at the
// highest layer of the tree and the tenant-querier-shuffle-shard queue algorithm applied at the second layer of the
// tree. If shuffle-sharding is enabled, a querier-worker that prioritizes ingester-only queries may not find
// ingester-only queries for any tenant it is assigned to, and move on to the next query component subtree. E.g.:
//
// 3. The tree traversal algorithm recurs down to select child queue nodes of the "ingester" node,
// where each child queue node is a non-empty tenant-specific queue of ingester-only queries.
// The tenant-querier-shuffle-shard queue algorithm checks each tenant node for if it is sharded to this querier.
//
// 4. (a) The first tenant queue node found which is sharded to this querier will be dequeued from, and we are done.
// 1. The QuerierWorkerQueuePriorityAlgo has node order:
// ["ingester", "store-gateway", "ingester-and-store-gateway", "unknown"].
//
// 4. (b) Otherwise, if none of those tenants are sharded to this querier, the tree traversal algorithm will return
// back up to the parent level and ask the QuerierWorkerQueuePriorityAlgo to select its next node.
// We continue to step 5.
// 2. A querier-worker with workerID 0 requests to dequeue; it prioritizes the "ingester" queue node.
//
// 5. The QuerierWorkerQueuePriorityAlgo will select the next node in the nodeOrder, "store-gateway".
// We return to step 3 and continue through steps 3, 4b, and 5, until we reach step 4a and exit.
// 3. The dequeue operation recursively dequeues from "ingester" node. Each child node is a tenant-specific
// queue of ingester-only queries. The tenantQuerierAssignments QueuingAlgorithm checks if any of its child nodes
// (tenant queues) is assigned to this querier, and finds none.
//
// This process of continuing to search for requests to dequeue helps prevent querier-worker capacity from sitting idle
// when there are no requests to dequeue for the query component node that the querier-worker was originally mapped to.
// 4. We walk back up to the QuerierWorkerQueuePriorityAlgo level, not having dequeued anything. The
// QuerierWorkerQueuePriorityAlgo increments currentNodeOrderIndex and selects the next node in nodeOrder (in
// this example, "store-gateway"), and checks for dequeue-able queries again, from step 3, etc. until a
// dequeue-able child is found, or every query component node has been checked for dequeue-able queries.
type QuerierWorkerQueuePriorityAlgo struct {
currentQuerierWorker int
currentNodeOrderIndex int
Expand Down

0 comments on commit f2b1544

Please sign in to comment.