Speedup Eval - Instrument With Otel #302
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Speedup Evaluation
Per #301, our experiments were really slow. We were observing times of approximately 30s to 60s to process individual examples. With the new dataset of 424 examples it would take ~7hours to run an experiment; significantly impeding iteration.
This PR instruments evaluation with OTEL so that we can see the bottlenecks in the code. Below are some graphs.
The graph below shows the P95 latency of processing evaluation examples. We can see its about 30s.
Below is a heatmap showing the duration of waitForBlockLog
We can see that its about 20s-30s so it accounts for a large portion of the duration.
A big source of the latency is the Analyzer uses a rate limiting queue for reprocessing the logs. The max delay is 30s so we are probably reprocessing the logs and 30s intervals. So to speed it up we make the delay configurable so we can use a shorter delay during experiments.
Here is an updated heatmap of the duration of wait for block log.
16:25 is when it started running with a maxDelaySeconds of 1s. We can see this drops the latency down significantly from about 30s to ~6s