-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pairwise evals): add option to randomize order of experiments; print URL for pairwise experiment #672
Conversation
@@ -586,6 +590,13 @@ def evaluate_comparative( | |||
raise ValueError("max_concurrency must be a positive integer.") | |||
client = client or langsmith.Client() | |||
|
|||
if randomize_order: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd do this on a run level to average out bias within an experiment (as it stands, this will still have a consistent bias to 1 experiment, just random which one)
dataset_id = comparative_experiment.reference_dataset_id | ||
base_url = project_url.split("/projects/p/")[0] | ||
comparison_url = ( | ||
f"{base_url}/datasets/{dataset_id}/compare?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - at some point may be nice to factor this uot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work if there's > 2 experiments? Is it more future proof if we just added a list of all the IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, will do that
No description provided.