Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python[patch]: evaluate accept list target #1342

Merged
merged 1 commit into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/langsmith/evaluation/_arunner.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""V2 Evaluation Interface."""

Check notice on line 1 in python/langsmith/evaluation/_arunner.py

View workflow job for this annotation

GitHub Actions / benchmark

Benchmark results

........... WARNING: the benchmark result may be unstable * the standard deviation (76.9 ms) is 11% of the mean (695 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_5_000_run_trees: Mean +- std dev: 695 ms +- 77 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (212 ms) is 15% of the mean (1.41 sec) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_10_000_run_trees: Mean +- std dev: 1.41 sec +- 0.21 sec ........... WARNING: the benchmark result may be unstable * the standard deviation (225 ms) is 16% of the mean (1.40 sec) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. create_20_000_run_trees: Mean +- std dev: 1.40 sec +- 0.22 sec ........... dumps_class_nested_py_branch_and_leaf_200x400: Mean +- std dev: 689 us +- 7 us ........... dumps_class_nested_py_leaf_50x100: Mean +- std dev: 24.9 ms +- 0.2 ms ........... dumps_class_nested_py_leaf_100x200: Mean +- std dev: 103 ms +- 2 ms ........... dumps_dataclass_nested_50x100: Mean +- std dev: 25.1 ms +- 0.1 ms ........... WARNING: the benchmark result may be unstable * the standard deviation (15.3 ms) is 22% of the mean (70.5 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python -m pyperf system tune' command to reduce the system jitter. Use pyperf stats, pyperf dump and pyperf hist to analyze results. Use --quiet option to hide these warnings. dumps_pydantic_nested_50x100: Mean +- std dev: 70.5 ms +- 15.3 ms ........... dumps_pydanticv1_nested_50x100: Mean +- std dev: 197 ms +- 2 ms

Check notice on line 1 in python/langsmith/evaluation/_arunner.py

View workflow job for this annotation

GitHub Actions / benchmark

Comparison against main

+-----------------------------------------------+----------+------------------------+ | Benchmark | main | changes | +===============================================+==========+========================+ | dumps_pydanticv1_nested_50x100 | 218 ms | 197 ms: 1.11x faster | +-----------------------------------------------+----------+------------------------+ | create_5_000_run_trees | 724 ms | 695 ms: 1.04x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_branch_and_leaf_200x400 | 695 us | 689 us: 1.01x faster | +-----------------------------------------------+----------+------------------------+ | dumps_dataclass_nested_50x100 | 25.1 ms | 25.1 ms: 1.00x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_50x100 | 24.9 ms | 24.9 ms: 1.00x faster | +-----------------------------------------------+----------+------------------------+ | dumps_class_nested_py_leaf_100x200 | 103 ms | 103 ms: 1.00x slower | +-----------------------------------------------+----------+------------------------+ | create_20_000_run_trees | 1.39 sec | 1.40 sec: 1.01x slower | +-----------------------------------------------+----------+------------------------+ | create_10_000_run_trees | 1.39 sec | 1.41 sec: 1.02x slower | +-----------------------------------------------+----------+------------------------+ | dumps_pydantic_nested_50x100 | 64.6 ms | 70.5 ms: 1.09x slower | +-----------------------------------------------+----------+------------------------+ | Geometric mean | (ref) | 1.00x faster | +-----------------------------------------------+----------+------------------------+

from __future__ import annotations

Expand Down Expand Up @@ -294,7 +294,7 @@
blocking=blocking,
**kwargs,
)
elif isinstance(target, tuple):
elif isinstance(target, (list, tuple)):
msg = (
"Running a comparison of two existing experiments asynchronously is not "
"currently supported. Please use the `evaluate()` method instead and make "
Expand Down
2 changes: 1 addition & 1 deletion python/langsmith/evaluation/_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -353,7 +353,7 @@ def evaluate(
blocking=blocking,
**kwargs,
)
elif isinstance(target, tuple):
elif isinstance(target, (list, tuple)):
invalid_args = {
"num_repetitions": num_repetitions > 1,
"experiment": bool(experiment),
Expand Down
Loading