-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster ensure_string_array #55183
Faster ensure_string_array #55183
Conversation
Also added a change to map_infer_mask which yields this improvement: | Change | Before [f4f598fb] <main> | After [7a823afa] <faster-ensure-string> | Ratio | Benchmark (Parameter) |
|----------|----------------------------|-------------------------------------------|---------|---------------------------------------------------|
| - | 11.7±0.6ms | 10.6±0.04ms | 0.91 | strings.Methods.time_isalnum('str') |
| - | 7.76±1ms | 7.04±0.03ms | 0.91 | strings.Methods.time_isnumeric('str') |
| - | 28.6±2ms | 26.0±0.5ms | 0.91 | strings.Methods.time_pad('string[pyarrow]') |
| - | 7.37±0.4ms | 6.62±0.05ms | 0.9 | strings.Methods.time_isspace('str') |
| - | 14.1±0.4ms | 12.7±0.2ms | 0.9 | strings.Methods.time_len('str') |
| - | 11.4±0.6ms | 10.3±0.05ms | 0.9 | strings.Methods.time_startswith('string[python]') |
| - | 8.11±0.9ms | 7.11±0.03ms | 0.88 | strings.Methods.time_isdigit('str') |
| - | 8.06±0.7ms | 6.96±0.03ms | 0.86 | strings.Methods.time_isdecimal('str') |
| - | 7.32±0.9ms | 6.30±0.02ms | 0.86 | strings.Methods.time_isdigit('string[python]') |
| - | 14.8±2ms | 12.5±0.1ms | 0.85 | strings.Methods.time_slice('string[python]') |
| - | 7.75±1ms | 6.32±0.03ms | 0.82 | strings.Methods.time_isnumeric('string[python]') |
| - | 15.1±2ms | 11.9±0.09ms | 0.79 | strings.Methods.time_slice('str') |
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED. Generally seems like the pattern of: cdef object arr = ...
cdef Py_ssize_t i = len(foo)
for i in range(i):
... = arr[i] Gets slower from 0.29->3.x, but was pretty inefficient to begin with. By declaring as an array outside of the loop it should help our code base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @WillAyd |
for i in range(n): | ||
val = arr[i] | ||
|
||
if not already_copied: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this check be done outside the loop?
for i in range(n): | ||
val = arr[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd rather spell this for i, val in enumerate(arr):
. Probably worth comparing if this gives the same performance.
This patch may have induced a performance regression. If it was a necessary behavior change, this may have been expected and everything is okay. Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected for appear as subbullets. Subsequent benchmarks may have skipped some commits. The link below lists the commits that are between the two benchmark runs where the regression was identified. |
Benchmarks are still using 0.29.x, but I think this also prevents a regression from upgrading to 3.X as noted in #55179