Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeInclusive iteration performance improvement. #57378

Closed
wants to merge 1 commit into from

Conversation

matthieu-m
Copy link
Contributor

@matthieu-m matthieu-m commented Jan 6, 2019

The current implementation of Iterator::{next, next_back} for
RangeInclusive leads to sub-optimal performance of loops as LLVM is not
capable of splitting the loop into a first-pass initialization
(computing is_empty) followed by the actual loop. This results in each
iteration performing two conditional jumps, which not only impacts the
performance of unoptimized loops, but also inhibits unrolling and
vectorization.

The proposed implementation switches things around, performing extra
work only on the last iteration of the loop. This results in even
unoptimized loops performing a single conditional jump in all but the
last iteration, matching Range's performance, as well as letting LLVM
unroll and vectorize when it would do so for Range's loop.

As a result, it should make iterating on inclusive ranges as fast as
iterating on exclusive ones; avoiding a papercut performance pitfall.

Unfortunately, it also appears to foil LLVM Loop Splitting optimization.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @rkruppe (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 6, 2019
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:3653fa10:start=1546789846190420746,finish=1546789915198335434,duration=69007914688
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@matthieu-m
Copy link
Contributor Author

matthieu-m commented Jan 6, 2019

The failure seems legitimate, LLVM apparently fails to constant-fold the loop over a RangeInclusive in test/codegen/issue-45222.rs.

I've reproduced the issue on the playground (https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=c6ad080bf6386dab551c2bed1ad6dbfb); so I'll have to fiddle with this to understand what is blocking LLVM.

@ranma42
Copy link
Contributor

ranma42 commented Jan 6, 2019

This is somewhat surprising, given that

fn foo3c(n: u64) -> u64 {
    let mut count = 0;
    (0..n).for_each(|_| {
        (0..n).chain(::std::iter::once(n)).rev().for_each(|j| {
            count += j;
        })
    });
    count
}

constant-folds just fine 🤔

@matthieu-m
Copy link
Contributor Author

Alright, let's go nuts: https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=c23c205c5f6dcdfeb958c2a6cf83ecdb .

// Doesn't const-fold (NAY)
fn triangle_inc_chain(n: u64) -> u64 {
    let mut count = 0;
    for j in (0..n).chain(::std::iter::once(n)) {
        count += j;
    }
    count
}

// Does const-fold (YAY)
fn triangle_inc_chain_foreach(n: u64) -> u64 {
    let mut count = 0;
    (0..n).chain(::std::iter::once(n)).for_each(|j| count += j);
    count
}

At this point, I'm really wondering what trips up LLVM.

Note: use of explicit for vs for_each doesn't impact the FixedRangeInclusive, that would be too simple.

@rkruppe : I am thinking that this test, as written, is bad. Whether LLVM const-fold or not seems to have no relation to the "tightness" of the generated LLVM IR, or its overall performance. It seems that we would be better serve by a check which actually verifies the number of conditional jump involved in the inner loop, rather than using const-folding as a proxy for performance.

@scottmcm
Copy link
Member

scottmcm commented Jan 9, 2019

Cross-reference: #56563

@matthieu-m
Copy link
Contributor Author

matthieu-m commented Jan 12, 2019

Performance discussion should be accompanied by benchmarks, so I put together a number of benchmarks and used criterion to evaluate the relative performance of:

  • exclusive ranges: 0..(n+1).
  • chain: (0..n).chain(::std::iter::once(n)).
  • inclusive ranges: 0..=n.
  • this PR inclusive ranges: inclusive(0, n).

And the results are the following:

Benchmark Exclusive Chain Inclusive This PR
Sum 1.1429 ns 4.7870 ns 0.971,05 ns 787,530 ns
Triangle Foreach 1.1357 ns 1.5852 ns 0.980,40 ns 806,090 ns
Triangle Loop 1.1351 ns 1,164,100 ns 0.977,86 ns 835,870 ns
Add Mul Foreach 1.1721 ms 1.1737 ms 1.1747 ms 1.1722 ms
Add Mul Loop 1.2204 ms 1.1685 ms 1.1731 ms 1.1704 ms
Pythagorean Triples 972.10 us 982.73 us 1,859.5 us 1,232.8 us

(see gist for details of each benchmark, I reported only the black-hole cases: https://gist.github.com/matthieu-m/df8dcfed3e23ca83ea5abf9e7b3ca4d3)

This yields two conclusions:

  • The Rust code in this PR yields on par, or better, assembly in "complex" cases:
    • Add Mul's body: count = (count + 3) * j;
    • Pythagorean triples
  • However, it is opaque to LLVM's closed formula transformation, preventing eliding the loop altogether in "simple" cases:
    • Straightforward sum.
    • Triangle-like sum.

Also, it is notable that LLVM's closed formula transformation kicks in for for_each but not for an explicit loop when using a chained iterator to do the triangle sum by hand. This is indicative of a "hole" in the transformation, which I have yet to understand.

I guess either understanding or fixing this hole is the key to getting an implementation of inclusive ranges which both yields good assembly and let LLVM perform the closed formula transformation.

In the absence of such understanding/fixing, I would tend to prefer better straightforward assembly at the expense of the closed formula transformation: it is easier for the user to substitute a closed formula rather than re-implement an inclusive range, and I am doubtful that a closed formula exists in many cases.

I also have to revise my statement about performance; while on the Add Mul example, inclusive ranges perform as good as exclusive one, there is still some overhead remaining in the Pythagorean Triples case. It may simply be the slight overhead of the inner loop magnified by the number of times it is executed, of course, and this PR still significantly improves performance: from x1.91 to x1.27 slow-down.


Does anyone have any idea as to what could prevent LLVM from effecting the closed formula transformation?

@matthieu-m
Copy link
Contributor Author

@kennytm As the author of the current version of RangeInclusive, do you remember if you had to do anything special to get LLVM to apply the closed formula transformation, which I believe to be the key to constant-folding?

@kennytm
Copy link
Member

kennytm commented Jan 12, 2019

@matthieu-m I haven't investigated what causes LLVM to const-fold a loop, it just happened that the test works after tweaking the representation and putting #[inline] as many places as possible 😓.

@matthieu-m
Copy link
Contributor Author

matthieu-m commented Jan 13, 2019

Updated performance number after specializing try_fold to make interior iteration more efficient12 .

Benchmark Exclusive Chain Inclusive This PR
Sum 1.1385 ns 4.8235 ns 0.99411 ns 1.5881 ns
Triangle Foreach 1.1373 ns 1.5840 ns 0.98306 ns 1.5955 ns
Triangle Loop 1.1296 ns 1,167,000 ns 0.96986 ns 838,240 ns
Add Mul Foreach 1.1620 ms 1.2057 ms 1.1737 ms 1.1665 ms
Add Mul Loop 1.1660 ms 1.1693 ms 1.1693 ms 1.2504 ms
Pythagorean Triples 875.04 us 907.83 us 1,844.3 us 895.42 us

This reinforces the conclusion that the RangeInclusive code in this PR yields much better code for "complex" loops; notably, it finally catches up to the exclusive Range in the Pythagorean Triples example.

A custom try_fold now also allows it to catch up to Range when using internal iteration in "simple" loops, as we can seen using sum or for_each.

Unfortunately, it does nothing to improve the performance of "simple" loops using external iteration, where LLVM just fails to perform Loop Splitting and subsequently to transform the loop into a closed form.

My experiments with Loop Splitting have found it extremely finicky, with very similar cases falling on either side of the divide. This is pretty frustrating 😢

1 The performance penalty observed is specific to the absence of Loop Splitting by LLVM; however in interior iteration we can manually split the loop between the loop itself and either a header or trailer, thereby gaining all our due performance without relying on getting lucky during optimizations.

2 As a more general note, it also means that (a) it is likely beneficial to implement a specialized try_fold on many of the current iterators, such as Chain, if not done already and (b) it is likely beneficial to use internal iteration over external iteration in current iterators, for example in Filter::count which doesn't.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:065440f8:start=1547393674353503519,finish=1547393753035679132,duration=78682175613
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:10:57] 
[01:10:57] running 122 tests
[01:11:01] i..ii...iii..iiii.....i..............i..i......F.........i.....i......ii...i..i.ii..............i... 100/122
[01:11:01] failures:
[01:11:01] 
[01:11:01] ---- [codegen] codegen/issue-45222.rs stdout ----
[01:11:01] 
[01:11:01] 
[01:11:01] error: verification with 'FileCheck' failed
[01:11:01] status: exit code: 1
[01:11:01] command: "/usr/lib/llvm-6.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll" "/checkout/src/test/codegen/issue-45222.rs"
[01:11:01] ------------------------------------------
[01:11:01] 
[01:11:01] ------------------------------------------
[01:11:01] stderr:
[01:11:01] stderr:
[01:11:01] ------------------------------------------
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:23:12: error: expected string not found in input
[01:11:01]  // CHECK: ret i64 500005000000000
[01:11:01]            ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:10:23: note: scanning from here
[01:11:01] define i64 @check_foo2() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:11:01]                       ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:64:23: note: possible intended match here
[01:11:01]  %exitcond.i.1 = icmp eq i64 %7, 100000
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:41:12: error: expected string not found in input
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:41:12: error: expected string not found in input
[01:11:01]  // CHECK: ret i64 5000050000
[01:11:01]            ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:76:31: note: scanning from here
[01:11:01] define i64 @check_triangle_inc() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:11:01]                               ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:105:2: note: possible intended match here
[01:11:01]  ret i64 %count.0.i
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:61:12: error: expected string not found in input
[01:11:01] /checkout/src/test/codegen/issue-45222.rs:61:12: error: expected string not found in input
[01:11:01]  // CHECK: ret i64 500050000000
[01:11:01]            ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:109:24: note: scanning from here
[01:11:01] define i64 @check_foo3r() unnamed_addr #1 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:11:01]                        ^
[01:11:01] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:163:27: note: possible intended match here
[01:11:01]  %exitcond.i.i.i.1 = icmp eq i64 %14, 10000
[01:11:01] 
[01:11:01] ------------------------------------------
[01:11:01] 
[01:11:01] thread '[codegen] codegen/issue-45222.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:3245:9
---
[01:11:01] 
[01:11:01] thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:495:22
[01:11:01] 
[01:11:01] 
[01:11:01] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-6.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "6.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:11:01] 
[01:11:01] 
[01:11:01] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:11:01] Build completed unsuccessfully in 0:11:57
[01:11:01] Build completed unsuccessfully in 0:11:57
[01:11:01] Makefile:48: recipe for target 'check' failed
[01:11:01] make: *** [check] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0f86a6d7
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Jan 13 16:47:04 UTC 2019

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:265246dc:start=1547398870285545705,finish=1547398940346930657,duration=70061384952
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:09:20] 
[01:09:20] running 122 tests
[01:09:23] i..ii...iii..iiii.....i..............i..i......F.........i.....i......ii...i..i.ii..............i... 100/122
[01:09:24] failures:
[01:09:24] 
[01:09:24] ---- [codegen] codegen/issue-45222.rs stdout ----
[01:09:24] 
[01:09:24] 
[01:09:24] error: verification with 'FileCheck' failed
[01:09:24] status: exit code: 1
[01:09:24] command: "/usr/lib/llvm-6.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll" "/checkout/src/test/codegen/issue-45222.rs"
[01:09:24] ------------------------------------------
[01:09:24] 
[01:09:24] ------------------------------------------
[01:09:24] stderr:
[01:09:24] stderr:
[01:09:24] ------------------------------------------
[01:09:24] /checkout/src/test/codegen/issue-45222.rs:23:12: error: expected string not found in input
[01:09:24]  // CHECK: ret i64 500005000000000
[01:09:24]            ^
[01:09:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:10:23: note: scanning from here
[01:09:24] define i64 @check_foo2() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:09:24]                       ^
[01:09:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:64:23: note: possible intended match here
[01:09:24]  %exitcond.i.1 = icmp eq i64 %7, 100000
[01:09:24] /checkout/src/test/codegen/issue-45222.rs:41:12: error: expected string not found in input
[01:09:24] /checkout/src/test/codegen/issue-45222.rs:41:12: error: expected string not found in input
[01:09:24]  // CHECK: ret i64 5000050000
[01:09:24]            ^
[01:09:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:76:31: note: scanning from here
[01:09:24] define i64 @check_triangle_inc() unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
[01:09:24]                               ^
[01:09:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issue-45222/issue-45222.ll:105:2: note: possible intended match here
[01:09:24]  ret i64 %count.0.i
[01:09:24] 
[01:09:24] ------------------------------------------
[01:09:24] 
[01:09:24] thread '[codegen] codegen/issue-45222.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:3245:9
---
[01:09:24] 
[01:09:24] thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:495:22
[01:09:24] 
[01:09:24] 
[01:09:24] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-6.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "6.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:09:24] 
[01:09:24] 
[01:09:24] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:09:24] Build completed unsuccessfully in 0:11:13
[01:09:24] Build completed unsuccessfully in 0:11:13
[01:09:24] Makefile:48: recipe for target 'check' failed
[01:09:24] make: *** [check] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:11ced63a
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Jan 13 18:11:53 UTC 2019
---
travis_time:end:06760162:start=1547403114594602927,finish=1547403114599052845,duration=4449918
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0234ee99
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|')

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@scottmcm
Copy link
Member

(a) it is likely beneficial to implement a specialized try_fold on many of the current iterators, such as Chain, if not done already

Many were -- including Chain, Filter, FilterMap, Enumerate, Scan, Fuse, and more -- in #45595 that added try_fold. There are probably still more, though, like VecDeque's.

@hanna-kruppe
Copy link
Contributor

Sorry, it doesn't seem like I'll be able to give this PR proper attention in the near future. Please assign someone else.

@scottmcm
Copy link
Member

r? @Kimundi

(For consistency with #56563 in the same area.)

Specialize Iterator::try_fold and DoubleEndedIterator::try_rfold to
improve code generation in all internal iteration scenarios.

This changes brings the performance of internal iteration with
RangeInclusive on par with the performance of iteration with Range:

 - Single conditional jump in hot loop,
 - Unrolling and vectorization,
 - And even Closed Form substitution.

Unfortunately, it only applies to internal iteration. Despite various
attempts at stream-lining the implementation of next and next_back,
LLVM has stubbornly refused to optimize external iteration
appropriately, leaving me with a choice between:

 - The current implementation, for which Closed Form substitution is
   performed, but which uses 2 conditional jumps in the hot loop when
   optimization fail.
 - An implementation using a "is_done" boolean, which uses 1
   conditional jump in the hot loop when optimization fail, allowing
   unrolling and vectorization, but for which Closed Form substitution
   fails.

In the absence of any conclusive evidence as to which usecase matters
most, and with no assurance that the lack of Closed Form substitution
is not indicative of other optimizations being foiled, there is no way
to pick one implementation over the other, and thus I defer to the
statu quo as far as next and next_back are concerned.
@matthieu-m
Copy link
Contributor Author

Unfortunately, I have yet to find a way to get LLVM to play nice with external iteration.

I'll open another PR to improve internal iteration; as force-push corrupted this one, it seems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants