fix shotover_chain_messages_per_batch_count
metric
#1633
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change 1
Every time shotover processes a transform chain, multiple requests and responses are batched and processed together.
This can be seen by the ChainState, which is passed through each transform in the chain, containing multiple requests.
This batching is purely opportunistic, we don't intentionally wait for requests or responses but if there are multiple requests or responses pending at the time we go to process them then they will be all processed in a single batch.
Lets call this batch a "chain batch".
Shotover has an existing metric
shotover_chain_messages_per_batch_count
which measures the number of requests included in a chain batch.At the time this metric was written the number of responses returned by a transform chain call was guaranteed to be the number of requests in the request batch.
However, for performance reasons, this guarantee was removed and now any number of responses can be returned in a chain call.
This internal change has altered the meaning of this metric, leaving it kind of broken:
messages
in the name its ambiguous if its measuring the batch size of requests or responses.So the existing
shotover_chain_messages_per_batch_count
metric should be split into 2 separate metrics one for requests and one for responses.TL;DR
shotover_chain_messages_per_batch_count
is no longer meaningful after some internal changes to shotover that occurred a year ago, to fix it, it needs to be split into two separate metrics.Change 2
Additionally, this PR changes the metric logic of the two metrics to only write to the metric when the batch is not empty.
Previously empty batches never occured, but now that requests and responses are decoupled it is common for a response batch to have responses while the request batch is empty, and vice versa.
In order to ensure the histogram continues to show meaningful data (not full of zeroes) we skip those cases.
While there could be some value in seeing how often we hit empty batches, that is not the point of these metrics, instead we are trying to get a good picture of the spread of batch sizes.
If we find the need to measure empty batches in the future we could add simple shotover_empty_request_batch + shotover_empty_response_batch counter metrics.