-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replay Proto-X #32
Replay Proto-X #32
Conversation
…ad or the query timed out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just got a few questions, mostly around dumping the page cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more change.
if self.logger: | ||
# We only stash the results if we're not doing HPO, or else the results from concurrent HPO would get | ||
# stashed in the same directory and potentially cause a race condition. | ||
if self.logger and not tuning_mode == TuningMode.HPO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also use the ray_trial_id
to stash the results during HPO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Summary: Can now replay a full tuning run from Proto-X. This is used to see how each step of tuning would have done without query timeouts and without Boot enabled.
Demo:
The image shows the data of a replayed run of TPC-H SF0.01 without Boot enabled during tuning. For each step of tuning, the replay shows the # of queries executed during the original run (which may be < 22 if the workload timed out), the # of queries that timed out during the original run, and whether the workload timed out. It also shows this same information about the replay. You can see that the replayed times are always >= the original times, which makes sense. Whenever the original run is "22,0,False" (i.e. 22 executed, 0 timed out, workload didn't time out), the replayed time matches closely.
Details:
execute_variations()
as well as all per-query knob variations tried. We can either replay the best variation or all variations. This is especially useful if the workload timed out in the original run, in which case the "best" variation is a misnomer as it is simply an arbitrary variation.reset()
was overwriting the logged replay information for a step, leading to a mismatch between the dumpedaction.pkl
file (which contains the DBMS configuration state) and therun.raw.csv
file (which contains the runtime information of the workload during that state)..link
extension to fix a subtle bug where a replay would overwrite theoutput.log
file of the original run.