-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A chance to get an update on this? #1
Comments
Several news articles about Firefox with Rust note that this would lose Firefox support on less widespread platforms (where there is no LLVM) — a problem the Rust frontend for GCC could solve. Is the GCC wiki up to date? https://gcc.gnu.org/wiki/RustFrontEnd |
Thanks for your interest. I would love to restart but its quite daunting how complex rust is. Its alot of work now to catchup. I am not sure i would have anything useable for people a long time unless i attract bunch of developers which could happen if i rewrite and make the code much cleaner. |
I can imagine that … maybe you could get help by asking Mozilla. They should have a vested interest in providing rust to all platforms supported by GCC. |
Maybe isn't the best place to question this but looking the answers, seems like they don't have much interest in starting this effort. I think it's a good idea to make this happen, I don't have much experience with GCC frontends neither Rust internals, but I don't mind in study those awesome technologies even further also, I can help if we have some issue list x) @redbrain Do you have intend to continue this work, even without the full support from Mozilla? Best Regards |
Thank you for taking it there. Keep in mind that there’s currently one person there who wouldn’t want to do anything with GCC (whitequark), and that’s a maintainer of parts of LLVM, so might not be totally unbiased. Maybe there might actually be companies willing to fund this — similar to the gccgo backend for Go? If you bet the future of your company on Rust, there better be more than just one implementation (or at least there better be a copyleft one which cannot be re-proprietarized by wrapping it into effectively mandatory proprietary code). |
I would love to work on this full time. I think rust has some serious benifits to go which i do really like. Having rust only on LLVM will make stability of the language harder in the long run. Having multiple compilers keeps the language honest. Also GCC for rust would be amazing to run benchmarks vs llvm. I think embeded systems are lacking for LLVM. Many more backends for GCC and GCC is doing well lately. I might have time over the summer to restart the project from scratch. Rust has changed alot since i started this project and i think finally i could make a good start on this since the language is more stable than 0.7 was when i first looked at rust. |
I create a topic to collect some feedback from Rust community. :o) |
whats the status on this? Id be interested in lending a hand if needed |
Hey @wpbirney thanks for your interest i would ideally like to restart the project and implement this within the rust compiler proper as a gcc plugin and there are rust bindings available for this. https://github.com/swgillespie/gccjit.rs The main problem implementing this as a frest front-end to GCC is that the amount of static analysis is too much to implement unless i was full time on this project. So implementing this like the LLVM interface to rustc seems like a good place to start and to get it properly maintained but i worry the interfaces to gccjit isn't feature rich enough for error handling and memory leaks that it may not be that great an alternative. I personally cant decide what i would like to do at this point. |
Is there a chance that you could run a fundraiser to see whether there is sufficient interest that you could actually get funded to do this right? Or ask Mozilla for funding? |
This needs to be done. If crowdfunding happens, I'll take this to Rust meetups here in DC to get the word out. Copyleft FTW! |
@sunnystormy i am really touched for your kind message i would love to work on this full time thinking of asking my current employer to sponsor me with this or reach out to Mozilla for this also. I think there is a lot of value in having a separate rust compiler like Go and GCC-go. |
Maybe you could even have Mozilla and your current employer cooperate on this. |
How about create a bounty in some bounty platform, like www.bountysource.com? (I've already finished some bounties here and it works). A description and some analysis about the problem, some proposed solutions will be awesome, helping backers to know the size of the problem and the benefits of the implementations. Mozilla already support multiple bounties at bounty source. |
poke |
Poke poke :-) I’d join a Patreon right away (with 5€ per month). |
I would also love to help with this. I can't stand the fact that Rust is limited to such a limited selection of platforms, and LLVM is making glacial progress at catching up with GCC in terms of targets. I'd be willing to contribute both code and cash to make this. I would love to talk to someone such as @redbrain to learn where I could start. |
@redbrain The free and open ISA, RISC-V, is properly supported by gcc (and has been for quite some time) but not fully by LLVM. It is clear that we need proper gcc support for Rust in order for it to remain relevant on future hardware platforms. Please let us know how we can help! Thank you! |
This means a lot the enthusiasm for this project. I am really torn on between writing the full front-end or doing it via a gcc plugin interface for the rust compiler as a command line switch. The later is much more concise project but a full front-end is much better solution to test GCC vs LLVM. The current code up there is pretty poor so i really need to start from scratch again with a fresh branch of gcc. Some of the parser can be reused but it also needs a pre-processor pass. |
@redbrain Do a full front-end, please. We want to have a proper alternative to gccgo. : ) |
@redbrain I can help you with the implementation, but I need some directions to not duplicate work. |
When you say a full front-end, do you mean a completely new Rust compiler that takes Rust code and then uses GCC as a back-end? As much as I would love for there to be a second working Rust compiler, I worry that it would lag behind the reference implementation in terms of keeping up with the rapidly evolving language. Plus there's the fact that there is still no Rust specification to determine which aspects of the original compiler need to be replicated faithfully. If this can be pulled off really well, it would be way better than a GCC plugin for the Rust compiler, but if not, I would personally rather see the plugin. |
@Serentty I understand your concern. Truth is, if the original Rust authors had considered gcc support many years ago, we wouldn’t be worrying about this right now. I know it’s a lot of work, but writing a proper Rust front-end for gcc is the right call. Writing a plug-in for the already existing Rust compiler is a bandaid at best, and forces users to depend upon the software that they’re actively trying to move away from. Let’s find people ready and willing to make this happen, and then let’s find ways to fund their efforts! Go team gccrs, go! —Addendum That being said... @redbrain, do you have any ideas about the resources you’d need to begin a proper gcc implementation? Don’t be afraid to let the community know so we can back you up! |
I suppose it really depends on what it is that you want to move away from. For me, it's LLVM, and I'm not bothered as much by having to use the reference compiler. Also, given that that compiler is making efforts to decouple itself from LLVM, I wouldn't consider adding a GCC backend to it to be a bandaid solution. However, I do see how this project has the potential to kill two birds with one stone if it manages to provide an alternative to that compiler as well. If it can pull that off. For that reason, maybe it is best to write full Rust frontend for GCC. If it actually pans out, I agree that it's much better than the plugin solution. |
Is there a good place (IRC, Matrix, or something like that) for people who are interested in collaborating on this to get together and discuss how it could be done? I seems to me like a #gccrs channel on Freenode could be useful for getting this going, but I wouldn't dare create it myself. |
I agree a #gccrs channel on freenode sounds like a great idea |
Someone actually did end up creating that channel on Freenode it seems. I'll certainly be hanging out there. |
…D_EXPR) using SVE. Given this input code: int sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n) { int sum = 0; for (int i = 0; i < n; i++) { sum += __builtin_abs (x[i] - y[i]); } return sum; } The resulting SVE code is: 0000000000000000 <sum_abs>: 0: 7100005f cmp w2, #0x0 4: 5400026d b.le 50 <sum_abs+0x50> 8: d2800003 mov x3, #0x0 // #0 c: 93407c42 sxtw x2, w2 10: 2538c002 mov z2.b, #0 14: 25221fe0 whilelo p0.b, xzr, x2 18: 2538c023 mov z3.b, #1 1c: 2518e3e1 ptrue p1.b 20: a4034000 ld1b {z0.b}, p0/z, [x0, x3] 24: a4034021 ld1b {z1.b}, p0/z, [x1, x3] 28: 0430e3e3 incb x3 2c: 0520c021 sel z1.b, p0, z1.b, z0.b 30: 25221c60 whilelo p0.b, x3, x2 34: 040d0420 uabd z0.b, p1/m, z0.b, z1.b 38: 44830402 udot z2.s, z0.b, z3.b 3c: 54ffff21 b.ne 20 <sum_abs+0x20> // b.any 40: 2598e3e0 ptrue p0.s 44: 04812042 uaddv d2, p0, z2.s 48: 1e260040 fmov w0, s2 4c: d65f03c0 ret 50: 1e2703e2 fmov s2, wzr 54: 1e260040 fmov w0, s2 58: d65f03c0 ret Notice how udot is used inside a fully masked loop. gcc/Changelog: 2019-05-07 Alejandro Martinez <[email protected]> * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New define_expand. (aarch64_<su>abd<mode>_3): Likewise. (*aarch64_<su>abd<mode>_3): New define_insn. (<sur>sad<vsi2qi>): New define_expand. * config/aarch64/iterators.md: Added MAX_OPP attribute. * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR. (build_vect_cond_expr): Likewise. gcc/testsuite/Changelog: 2019-05-07 Alejandro Martinez <[email protected]> * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute differences. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@270975 138bc75d-0d04-0410-961f-82ee72b054a4
AFAIK the rust developers are trying to make their compiler more backend independent. This is needed for fast debug builds via cranelift. So in theory, it should become easier to plunge libgccjit (or whatever will be used) to MIR. |
Too bad that no rustc developer commented here on what do they think and if they'd like to help make this easier. |
…ddress Fix an ICE with the handling of RTL expressions like: (subreg:QI (mem/c:SI (plus:SI (plus:SI (mult:SI (reg/v:SI 0 %r0 [orig:67 i ] [67]) (const_int 4 [0x4])) (reg/v/f:SI 7 %r7 [orig:59 doacross ] [59])) (const_int 40 [0x28])) [1 MEM[(unsigned int *)doacross_63 + 40B + i_106 * 4]+0 S4 A32]) 0) that causes the compilation of libgomp to fail: during RTL pass: reload .../libgomp/ordered.c: In function 'GOMP_doacross_wait': .../libgomp/ordered.c:507:1: internal compiler error: in change_address_1, at emit-rtl.c:2275 507 | } | ^ 0x10a3462b change_address_1 .../gcc/emit-rtl.c:2275 0x10a353a7 adjust_address_1(rtx_def*, machine_mode, poly_int<1u, long>, int, int, int, poly_int<1u, long>) .../gcc/emit-rtl.c:2409 0x10ae2993 alter_subreg(rtx_def**, bool) .../gcc/final.c:3368 0x10ae25cf cleanup_subreg_operands(rtx_insn*) .../gcc/final.c:3322 0x110922a3 reload(rtx_insn*, int) .../gcc/reload1.c:1232 0x10de2bf7 do_reload .../gcc/ira.c:5812 0x10de3377 execute .../gcc/ira.c:5986 in a `vax-netbsdelf' build, where an attempt is made to change the mode of the contained memory reference to the mode of the containing SUBREG. Such RTL expressions are produced by the VAX shift and rotate patterns (`ashift', `ashiftrt', `rotate', `rotatert') where the count operand always has the QI mode regardless of the mode, either SI or DI, of the datum shifted or rotated. Such a mode change cannot work where the memory reference uses the indexed addressing mode, where a multiplier is implied that in the VAX ISA depends on the width of the memory access requested and therefore changing the machine mode would change the address calculation as well. Avoid the attempt then by forcing the reload of any SUBREGs containing a mode-dependent memory reference, also fixing these regressions: FAIL: gcc.c-torture/compile/pr46883.c -Os (internal compiler error) FAIL: gcc.c-torture/compile/pr46883.c -Os (test for excess errors) FAIL: gcc.c-torture/execute/20120808-1.c -O2 (internal compiler error) FAIL: gcc.c-torture/execute/20120808-1.c -O2 (test for excess errors) FAIL: gcc.c-torture/execute/20120808-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error) FAIL: gcc.c-torture/execute/20120808-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gcc.c-torture/execute/20120808-1.c -O3 -g (internal compiler error) FAIL: gcc.c-torture/execute/20120808-1.c -O3 -g (test for excess errors) FAIL: gcc.c-torture/execute/20120808-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) FAIL: gcc.c-torture/execute/20120808-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.c-torture/execute/20120808-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) FAIL: gcc.c-torture/execute/20120808-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) FAIL: gcc.dg/20050629-1.c (internal compiler error) FAIL: gcc.dg/20050629-1.c (test for excess errors) FAIL: c-c++-common/torture/pr53505.c -Os (internal compiler error) FAIL: c-c++-common/torture/pr53505.c -Os (test for excess errors) FAIL: gfortran.dg/coarray_failed_images_1.f08 -Os (internal compiler error) FAIL: gfortran.dg/coarray_stopped_images_1.f08 -Os (internal compiler error) With test case #0 included it causes a reload with: (insn 15 14 16 4 (set (reg:SI 31) (ashift:SI (const_int 1 [0x1]) (subreg:QI (reg:SI 30 [ MEM[(int *)s_8(D) + 4B + _5 * 4] ]) 0))) "pr58901-0.c":15:12 94 {ashlsi3} (expr_list:REG_DEAD (reg:SI 30 [ MEM[(int *)s_8(D) + 4B + _5 * 4] ]) (nil))) as follows: Reloads for insn # 15 Reload 0: reload_in (SI) = (reg:SI 30 [ MEM[(int *)s_8(D) + 4B + _5 * 4] ]) ALL_REGS, RELOAD_FOR_INPUT (opnum = 2) reload_in_reg: (reg:SI 30 [ MEM[(int *)s_8(D) + 4B + _5 * 4] ]) reload_reg_rtx: (reg:SI 5 %r5) resulting in: (insn 37 14 15 4 (set (reg:SI 5 %r5) (mem/c:SI (plus:SI (plus:SI (mult:SI (reg/v:SI 1 %r1 [orig:25 i ] [25]) (const_int 4 [0x4])) (reg/v/f:SI 4 %r4 [orig:29 s ] [29])) (const_int 4 [0x4])) [1 MEM[(int *)s_8(D) + 4B + _5 * 4]+0 S4 A32])) "pr58901-0.c":15:12 12 {movsi_2} (nil)) (insn 15 37 16 4 (set (reg:SI 2 %r2 [31]) (ashift:SI (const_int 1 [0x1]) (reg:QI 5 %r5))) "pr58901-0.c":15:12 94 {ashlsi3} (nil)) and assembly like: .L3: movl 4(%r4)[%r1],%r5 ashl %r5,$1,%r2 xorl2 %r2,%r0 incl %r1 cmpl %r1,%r3 jneq .L3 produced for the loop, providing optimization has been enabled. Likewise with test case #1 the reload of: (insn 17 16 18 4 (set (reg:SI 34) (and:SI (subreg:SI (reg/v:DI 27 [ t ]) 4) (const_int 1 [0x1]))) "pr58901-1.c":18:20 77 {*andsi_const_int} (expr_list:REG_DEAD (reg/v:DI 27 [ t ]) (nil))) is as follows: Reloads for insn # 17 Reload 0: reload_in (DI) = (reg/v:DI 27 [ t ]) reload_out (SI) = (reg:SI 2 %r2 [34]) ALL_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg/v:DI 27 [ t ]) reload_out_reg: (reg:SI 2 %r2 [34]) reload_reg_rtx: (reg:DI 4 %r4) resulting in: (insn 40 16 17 4 (set (reg:DI 4 %r4) (mem/c:DI (plus:SI (mult:SI (reg/v:SI 1 %r1 [orig:26 i ] [26]) (const_int 8 [0x8])) (reg/v/f:SI 3 %r3 [orig:30 s ] [30])) [1 MEM[(const struct s *)s_13(D) + _7 * 8]+0 S8 A32])) "pr58901-1.c":18:20 11 {movdi} (nil)) (insn 17 40 41 4 (set (reg:SI 4 %r4) (and:SI (reg:SI 5 %r5 [+4 ]) (const_int 1 [0x1]))) "pr58901-1.c":18:20 77 {*andsi_const_int} (nil)) and assembly like: .L3: movq (%r3)[%r1],%r4 bicl3 $-2,%r5,%r4 addl2 %r4,%r0 jaoblss %r0,%r1,.L3 First posted at: <https://gcc.gnu.org/ml/gcc/2014-06/msg00060.html>. 2020-12-05 Matt Thomas <[email protected]> Maciej W. Rozycki <[email protected]> gcc/ PR target/58901 * reload.c (push_reload): Also reload the inner expression of a SUBREG for pseudos associated with a mode-dependent memory reference. (find_reloads): Force a reload likewise. 2020-12-05 Maciej W. Rozycki <[email protected]> gcc/testsuite/ PR target/58901 * gcc.c-torture/compile/pr58901-0.c: New test. * gcc.c-torture/compile/pr58901-1.c: New test.
/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128' #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77 #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190 #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338 #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370 #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f) #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24) #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd) Differential Revision: https://reviews.llvm.org/D97263 Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.
This test relies on -mfloat-abi=hard to pass (otherwise test_mov_imm_[12] directly build the 1.0 fp16 representation via movw r0, #15360 rather than using vmov.f16 s0, #1.0e+0 as expected by scan-assembler-times) Adding the arm_hard_ok check makes the test unsupported eg. on arm-linux-gnueabi instead of reporting a failure. 2021-03-20 Christophe Lyon <[email protected]> gcc/testsuite/ * gcc.target/arm/armv8_2-fp16-scalar-2.c: Add arm_hard_ok.
As suggested in the PR, the following patch adds two new clrsb expansion possibilities if target doesn't have clrsb_optab for the requested nor wider modes, but does have clz_optab for the requested mode. One expansion is clrsb (op0) expands as clz (op0 ^ (((stype)op0) >> (prec-1))) - 1 which is usable if CLZ_DEFINED_VALUE_AT_ZERO is 2 with value of prec, because the clz argument can be 0 and clrsb should give prec-1 in that case. The other expansion is clz (((op0 << 1) ^ (((stype)op0) >> (prec-1))) | 1) where the clz argument is never 0, but it is one operation longer. E.g. on x86_64-linux with -O2 -mno-lzcnt, this results for int foo (int x) { return __builtin_clrsb (x); } in - subq $8, %rsp - movslq %edi, %rdi - call __clrsbdi2 - addq $8, %rsp - subl $32, %eax + leal (%rdi,%rdi), %eax + sarl $31, %edi + xorl %edi, %eax + orl $1, %eax + bsrl %eax, %eax + xorl $31, %eax and with -O2 -mlzcnt: + movl %edi, %eax + sarl $31, %eax + xorl %edi, %eax + lzcntl %eax, %eax + subl $1, %eax On armv7hl-linux-gnueabi with -O2: - push {r4, lr} - bl __clrsbsi2 - pop {r4, pc} + @ link register save eliminated. + eor r0, r0, r0, asr #31 + clz r0, r0 + sub r0, r0, #1 + bx lr As it (at least usually) will make code larger, it is disabled for -Os or cold instructions. 2021-08-19 Jakub Jelinek <[email protected]> PR middle-end/101950 * optabs.c (expand_clrsb_using_clz): New function. (expand_unop): Use it as another clrsb expansion fallback. * gcc.target/i386/pr101950-1.c: New test. * gcc.target/i386/pr101950-2.c: New test.
…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 #5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 #6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 #7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 #8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 #9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <[email protected]> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.
This patch implements C++23 P2255R2, which adds two new type traits to detect reference binding to a temporary. They can be used to detect code like std::tuple<const std::string&> t("meow"); which is incorrect because it always creates a dangling reference, because the std::string temporary is created inside the selected constructor of std::tuple, and not outside it. There are two new compiler builtins, __reference_constructs_from_temporary and __reference_converts_from_temporary. The former is used to simulate direct- and the latter copy-initialization context. But I had a hard time finding a test where there's actually a difference. Under DR 2267, both of these are invalid: struct A { } a; struct B { explicit B(const A&); }; const B &b1{a}; const B &b2(a); so I had to peruse [over.match.ref], and eventually realized that the difference can be seen here: struct G { operator int(); // #1 explicit operator int&&(); // #2 }; int&& r1(G{}); // use #2 (no temporary) int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&) The implementation itself was rather straightforward because we already have the conv_binds_ref_to_prvalue function. The main function here is ref_xes_from_temporary. I've changed the return type of ref_conv_binds_directly to tristate, because previously the function didn't distinguish between an invalid conversion and one that binds to a prvalue. Since it no longer returns a bool, I removed the _p suffix. The patch also adds the relevant class and variable templates to <type_traits>. PR c++/104477 gcc/c-family/ChangeLog: * c-common.cc (c_common_reswords): Add __reference_constructs_from_temporary and __reference_converts_from_temporary. * c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. gcc/cp/ChangeLog: * call.cc (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_directly): ... this. Add a new bool parameter. Change the return type to tristate. * constraint.cc (diagnose_trait_expr): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * cp-tree.h: Include "tristate.h". (enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_directly): ... this. (ref_xes_from_temporary): Declare. * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * method.cc (ref_xes_from_temporary): New. * parser.cc (cp_parser_primary_expression): Handle RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. (cp_parser_trait_expr): Likewise. (warn_for_range_copy): Adjust to call ref_conv_binds_directly. * semantics.cc (trait_expr_value): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (finish_trait_expr): Likewise. libstdc++-v3/ChangeLog: * include/std/type_traits (reference_constructs_from_temporary, reference_converts_from_temporary): New class templates. (reference_constructs_from_temporary_v, reference_converts_from_temporary_v): New variable templates. (__cpp_lib_reference_from_temporary): Define for C++23. * include/std/version (__cpp_lib_reference_from_temporary): Define for C++23. * testsuite/20_util/variable_templates_for_traits.cc: Test reference_constructs_from_temporary_v and reference_converts_from_temporary_v. * testsuite/20_util/reference_from_temporary/value.cc: New test. * testsuite/20_util/reference_from_temporary/value2.cc: New test. * testsuite/20_util/reference_from_temporary/version.cc: New test. gcc/testsuite/ChangeLog: * g++.dg/ext/reference_constructs_from_temporary1.C: New test. * g++.dg/ext/reference_converts_from_temporary1.C: New test.
In my previous patches I've been extending our std::move warnings, but this tweak actually dials it down a little bit. As reported in bug 89780, it's questionable to warn about expressions in templates that were type-dependent, but aren't anymore because we're instantiating the template. As in, template <typename T> Dest withMove() { T x; return std::move(x); } template Dest withMove<Dest>(); // #1 template Dest withMove<Source>(); // #2 Saying that the std::move is pessimizing for #1 is not incorrect, but it's not useful, because removing the std::move would then pessimize #2. So the user can't really win. At the same time, disabling the warning just because we're in a template would be going too far, I still want to warn for template <typename> Dest withMove() { Dest x; return std::move(x); } because the std::move therein will be pessimizing for any instantiation. So I'm using the suppress_warning machinery to that effect. Problem: I had to add a new group to nowarn_spec_t, otherwise suppressing the -Wpessimizing-move warning would disable a whole bunch of other warnings, which we really don't want. PR c++/89780 gcc/cp/ChangeLog: * pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Maybe suppress -Wpessimizing-move. * typeck.cc (maybe_warn_pessimizing_move): Don't issue warnings if they are suppressed. (check_return_expr): Disable -Wpessimizing-move when returning a dependent expression. gcc/ChangeLog: * diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Handle OPT_Wpessimizing_move and OPT_Wredundant_move. * diagnostic-spec.h (nowarn_spec_t): Add NW_REDUNDANT enumerator. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/Wpessimizing-move3.C: Remove dg-warning. * g++.dg/cpp0x/Wredundant-move2.C: Likewise.
To improve compile times, the C++ library could use compiler built-ins rather than implementing std::is_convertible (and _nothrow) as class templates. This patch adds the built-ins. We already have __is_constructible and __is_assignable, and the nothrow forms of those. Microsoft (and clang, for compatibility) also provide an alias called __is_convertible_to. I did not add it, but it would be trivial to do so. I noticed that our __is_assignable doesn't implement the "Access checks are performed as if from a context unrelated to either type" requirement, therefore std::is_assignable / __is_assignable give two different results here: class S { operator int(); friend void g(); // Rust-GCC#1 }; void g () { // Rust-GCC#1 doesn't matter static_assert(std::is_assignable<int&, S>::value, ""); static_assert(__is_assignable(int&, S), ""); } This is not a problem if __is_assignable is not meant to be used by the users. This patch doesn't make libstdc++ use the new built-ins, but I had to rename a class otherwise its name would clash with the new built-in. PR c++/106784 gcc/c-family/ChangeLog: * c-common.cc (c_common_reswords): Add __is_convertible and __is_nothrow_convertible. * c-common.h (enum rid): Add RID_IS_CONVERTIBLE and RID_IS_NOTHROW_CONVERTIBLE. gcc/cp/ChangeLog: * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE. * cp-objcp-common.cc (names_builtin_p): Handle RID_IS_CONVERTIBLE RID_IS_NOTHROW_CONVERTIBLE. * cp-tree.h (enum cp_trait_kind): Add CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE. (is_convertible): Declare. (is_nothrow_convertible): Likewise. * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE. * method.cc (is_convertible): New. (is_nothrow_convertible): Likewise. * parser.cc (cp_parser_primary_expression): Handle RID_IS_CONVERTIBLE and RID_IS_NOTHROW_CONVERTIBLE. (cp_parser_trait_expr): Likewise. * semantics.cc (trait_expr_value): Handle CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE. (finish_trait_expr): Likewise. libstdc++-v3/ChangeLog: * include/std/type_traits: Rename __is_nothrow_convertible to __is_nothrow_convertible_lib. * testsuite/20_util/is_nothrow_convertible/value_ext.cc: Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/has-builtin-1.C: Enhance to test __is_convertible and __is_nothrow_convertible. * g++.dg/ext/is_convertible1.C: New test. * g++.dg/ext/is_convertible2.C: New test. * g++.dg/ext/is_nothrow_convertible1.C: New test. * g++.dg/ext/is_nothrow_convertible2.C: New test.
In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. This patch adds an named function to allow us to emit an optimized sequence when doing an unsigned division that is equivalent to: x = y / (2 ^ (bitsize (y)/2)-1) For SVE2 this means we generate for: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] = (pixel[i] * level) / 0xff; } the following: mov z3.b, #1 .L3: ld1b z0.h, p0/z, [x0, x3] mul z0.h, p1/m, z0.h, z2.h addhnb z1.b, z0.h, z3.h addhnb z0.b, z0.h, z1.h st1b z0.h, p0, [x0, x3] inch x3 whilelo p0.h, w3, w2 b.any .L3 instead of: .L3: ld1b z0.h, p1/z, [x0, x3] mul z0.h, p0/m, z0.h, z1.h umulh z0.h, p0/m, z0.h, z2.h lsr z0.h, z0.h, #7 st1b z0.h, p1, [x0, x3] inch x3 whilelo p1.h, w3, w2 b.any .L3 Which results in significantly faster code. gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test.
A friend declaration can only have constraints if it is defined. If multiple instantiations of a class template define the same friend function signature, it's an error, but that shouldn't happen if it's constrained to only be declared in one instantiation. Currently we don't mangle requirements, so the foos all mangle the same and actually instantiating #1 will break, but for now we can test that they're considered distinct. gcc/cp/ChangeLog: * pt.cc (tsubst_friend_function): Check satisfaction. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-friend11.C: New test.
While looking at PR 105549, which is about fixing the ABI break introduced in GCC 9.1 in parameter alignment with bit-fields, we noticed that the GCC 9.1 warning is not emitted in all the cases where it should be. This patch fixes that and the next patch in the series fixes the GCC 9.1 break. We split this into two patches since patch #2 introduces a new ABI break starting with GCC 13.1. This way, patch #1 can be back-ported to release branches if needed to fix the GCC 9.1 warning issue. The main idea is to add a new global boolean that indicates whether we're expanding the start of a function, so that aarch64_layout_arg can emit warnings for callees as well as callers. This removes the need for aarch64_function_arg_boundary to warn (with its incomplete information). However, in the first patch there are still cases where we emit warnings were we should not; this is fixed in patch #2 where we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly. The fix in aarch64_function_arg_boundary (replacing & with &&) looks like an oversight of a previous commit in this area which changed 'abi_break' from a boolean to an integer. We also take the opportunity to fix the comment above aarch64_function_arg_alignment since the value of the abi_break parameter was changed in a previous commit, no longer matching the description. 2022-11-28 Christophe Lyon <[email protected]> Richard Sandiford <[email protected]> gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix comment. (aarch64_layout_arg): Factorize warning conditions. (aarch64_function_arg_boundary): Fix typo. * function.cc (currently_expanding_function_start): New variable. (expand_function_start): Handle currently_expanding_function_start. * function.h (currently_expanding_function_start): Declare. gcc/testsuite/ChangeLog: * gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning.h: New test. * g++.target/aarch64/bitfield-abi-warning-align16-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: New test. * g++.target/aarch64/bitfield-abi-warning-align32-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: New test. * g++.target/aarch64/bitfield-abi-warning-align8-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning.h: New test.
This recognises the patterns of the form: while (n & 1) { n >>= 1 } Unfortunately there are currently two issues relating to this patch. Firstly, simplify_using_initial_conditions does not recognise that (n != 0) and ((n & 1) == 0) implies that ((n >> 1) != 0). This preconditions arise following the loop copy-header pass, and the assumptions returned by number_of_iterations_exit_assumptions then prevent final value replacement from using the niter result. I'm not sure what is the best way to fix this - one approach could be to modify simplify_using_initial_conditions to handle this sort of case, but it seems that it basically wants the information that ranger could give anway, so would something like that be a better option? The second issue arises in the vectoriser, which is able to determine that the niter->assumptions are always true. When building with -march=armv8.4-a+sve -S -O3, we get this codegen: foo (unsigned int b) { int c = 0; if (b == 0) return PREC; while (!(b & (1 << (PREC - 1)))) { b <<= 1; c++; } return c; } foo: .LFB0: .cfi_startproc cmp w0, 0 cbz w0, .L6 blt .L7 lsl w1, w0, 1 clz w2, w1 cmp w2, 14 bls .L8 mov x0, 0 cntw x3 add w1, w2, 1 index z1.s, #0, #1 whilelo p0.s, wzr, w1 .L4: add x0, x0, x3 mov p1.b, p0.b mov z0.d, z1.d whilelo p0.s, w0, w1 incw z1.s b.any .L4 add z0.s, z0.s, #1 lastb w0, p1, z0.s ret .p2align 2,,3 .L8: mov w0, 0 b .L3 .p2align 2,,3 .L13: lsl w1, w1, 1 .L3: add w0, w0, 1 tbz w1, #31, .L13 ret .p2align 2,,3 .L6: mov w0, 32 ret .p2align 2,,3 .L7: mov w0, 0 ret .cfi_endproc In essence, the vectoriser uses the niter information to determine exactly how many iterations of the loop it needs to run. It then uses SVE whilelo instructions to run this number of iterations. The original loop counter is also vectorised, despite only being used in the final iteration, and then the final value of this counter is used as the return value (which is the same as the number of iterations it computed in the first place). This vectorisation is obviously bad, and I think it exposes a latent bug in the vectoriser, rather than being an issue caused by this specific patch. gcc/ChangeLog: * tree-ssa-loop-niter.cc (number_of_iterations_cltz): New. (number_of_iterations_bitcount): Add call to the above. (number_of_iterations_exit_assumptions): Add EQ_EXPR case for c[lt]z idiom recognition. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/cltz-max.c: New test. * gcc.dg/tree-ssa/clz-char.c: New test. * gcc.dg/tree-ssa/clz-int.c: New test. * gcc.dg/tree-ssa/clz-long-long.c: New test. * gcc.dg/tree-ssa/clz-long.c: New test. * gcc.dg/tree-ssa/ctz-char.c: New test. * gcc.dg/tree-ssa/ctz-int.c: New test. * gcc.dg/tree-ssa/ctz-long-long.c: New test. * gcc.dg/tree-ssa/ctz-long.c: New test.
Here the ahead-of-time overload set pruning in finish_call_expr is unintentionally returning a CALL_EXPR whose (pruned) callee is wrapped in an ADDR_EXPR, despite the original callee not being wrapped in an ADDR_EXPR. This ends up causing a bogus declaration mismatch error in the below testcase because the call to min in #1 gets expressed as a CALL_EXPR of ADDR_EXPR of FUNCTION_DECL, whereas the level-lowered call to min in #2 gets expressed instead as a CALL_EXPR of FUNCTION_DECL. This patch fixes this by stripping the spurious ADDR_EXPR appropriately. Thus the first call to min now also gets expressed as a CALL_EXPR of FUNCTION_DECL, matching the behavior before r12-6075-g2decd2cabe5a4f. PR c++/107461 gcc/cp/ChangeLog: * semantics.cc (finish_call_expr): Strip ADDR_EXPR from the selected callee during overload set pruning. gcc/testsuite/ChangeLog: * g++.dg/template/call9.C: New test.
After r13-5684-g59e0376f607805 the (pruned) callee of a non-dependent CALL_EXPR is a bare FUNCTION_DECL rather than ADDR_EXPR of FUNCTION_DECL. This innocent change revealed that cp_tree_equal doesn't first check dependence of a CALL_EXPR before treating a FUNCTION_DECL callee as a dependent name, which leads to us incorrectly accepting the first two testcases below and rejecting the third: * In the first testcase, cp_tree_equal incorrectly returns true for the two non-dependent CALL_EXPRs f(0) and f(0) (whose CALL_EXPR_FN are different FUNCTION_DECLs) which causes us to treat #2 as a redeclaration of #1. * Same issue in the second testcase, for f<int*>() and f<char>(). * In the third testcase, cp_tree_equal incorrectly returns true for f<int>() and f<void(*)(int)>() which causes us to conflate the two dependent specializations A<decltype(f<int>()(U()))> and A<decltype(f<void(*)(int)>()(U()))>. This patch fixes this by making called_fns_equal treat two callees as dependent names only if the overall CALL_EXPRs are dependent, via a new convenience function call_expr_dependent_name that is like dependent_name but also checks dependence of the overall CALL_EXPR. PR c++/107461 gcc/cp/ChangeLog: * cp-tree.h (call_expr_dependent_name): Declare. * pt.cc (iterative_hash_template_arg) <case CALL_EXPR>: Use call_expr_dependent_name instead of dependent_name. * tree.cc (call_expr_dependent_name): Define. (called_fns_equal): Adjust to take two CALL_EXPRs instead of CALL_EXPR_FNs thereof. Use call_expr_dependent_name instead of dependent_name. (cp_tree_equal) <case CALL_EXPR>: Adjust call to called_fns_equal. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/overload5.C: New test. * g++.dg/cpp0x/overload5a.C: New test. * g++.dg/cpp0x/overload6.C: New test.
-Wmismatched-tags warns about the (harmless) struct/class mismatch. For, e.g., template<typename T> struct A { }; class A<int> a; it works by adding A<T> to the class2loc hash table while parsing the class-head and then, while parsing the elaborate type-specifier, we add A<int>. At the end of c_parse_file we go through the table and warn about the class-key mismatches. In this PR we crash though; we have template<typename T> struct A { template<typename U> struct W { }; }; struct A<int>::W<int> w; // #1 where while parsing A and #1 we've stashed A<T> A<T>::W<U> A<int>::W<int> into class2loc. Then in class_decl_loc_t::diag_mismatched_tags TYPE is A<int>::W<int>, and specialization_of gets us A<int>::W<U>, which is not in class2loc, so we crash on gcc_assert (cdlguide). But it's OK not to have found A<int>::W<U>, we should just look one "level" up, that is, A<T>::W<U>. It's important to handle class specializations, so e.g. template<> struct A<char> { template<typename U> class W { }; }; where W's class-key is different than in the primary template above, so we should warn depending on whether we're looking into A<char> or into a different instantiation. PR c++/106259 gcc/cp/ChangeLog: * parser.cc (class_decl_loc_t::diag_mismatched_tags): If the first lookup of SPEC didn't find anything, try to look for most_general_template. gcc/testsuite/ChangeLog: * g++.dg/warn/Wmismatched-tags-11.C: New test.
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A #4, R12 { RRAM.A #4, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <[email protected]> Richard Biener <[email protected]> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
Since the last import from upstream libsanitizer, the output has changed and now looks more like this: READ of size 6 at 0x7ff7beb2a144 thread T0 #0 0x101cf7796 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) sanitizer_common_interceptors.inc:813 #1 0x101cf7b99 in memcmp sanitizer_common_interceptors.inc:840 #2 0x108a0c39f in __stack_chk_guard+0xf (dyld:x86_64+0x8039f) so let's adjust the pattern accordingly. gcc/testsuite/ChangeLog: * c-c++-common/asan/memcmp-1.c: Adjust pattern on darwin.
During partial ordering, we want to look through dependent alias template specializations within template arguments and otherwise treat them as opaque in other contexts (see e.g. r7-7116-g0c942f3edab108 and r11-7011-g6e0a231a4aa240). To that end template_args_equal was given a partial_order flag that controls this behavior. This flag does the right thing when a dependent alias template specialization appears as template argument of the partial specialization, e.g. in template<class T, class...> using first_t = T; template<class T> struct traits; template<class T> struct traits<first_t<T, T&>> { }; // #1 template<class T> struct traits<first_t<const T, T&>> { }; // #2 we correctly consider #2 to be more specialized than #1. But if the alias specialization appears as a nested template argument of another class template specialization, e.g. in template<class T> struct traits<A<first_t<T, T&>>> { }; // #1 template<class T> struct traits<A<first_t<const T, T&>>> { }; // #2 then we incorrectly consider #1 and #2 to be unordered. This is because 1. we don't propagate the flag to recursive template_args_equal calls 2. we don't use structural equality for class template specializations written in terms of dependent alias template specializations This patch fixes the first issue by turning the partial_order flag into a global. This patch fixes the second issue by making us propagate structural equality appropriately when building a class template specialization. In passing this patch also improves hashing of specializations that use structural equality. PR c++/90679 gcc/cp/ChangeLog: * cp-tree.h (comp_template_args): Remove partial_order parameter. (template_args_equal): Likewise. * pt.cc (comparing_for_partial_ordering): New global flag. (iterative_hash_template_arg) <case tcc_type>: Hash the template and arguments for specializations that use structural equality. (template_args_equal): Remove partial order parameter and use comparing_for_partial_ordering instead. (comp_template_args): Likewise. (comp_template_args_porder): Set comparing_for_partial_ordering instead. Make static. (any_template_arguments_need_structural_equality_p): Return true for an argument that's a dependent alias template specialization or a class template specialization that itself needs structural equality. * tree.cc (cp_tree_equal) <case TREE_VEC>: Adjust call to comp_template_args. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/alias-decl-75a.C: New test. * g++.dg/cpp0x/alias-decl-75b.C: New test.
Hi All, This patch adds initial support for early break vectorization in GCC. In other words it implements support for vectorization of loops with multiple exits. The support is added for any target that implements a vector cbranch optab, this includes both fully masked and non-masked targets. Depending on the operation, the vectorizer may also require support for boolean mask reductions using Inclusive OR/Bitwise AND. This is however only checked then the comparison would produce multiple statements. This also fully decouples the vectorizer's notion of exit from the existing loop infrastructure's exit. Before this patch the vectorizer always picked the natural loop latch connected exit as the main exit. After this patch the vectorizer is free to choose any exit it deems appropriate as the main exit. This means that even if the main exit is not countable (i.e. the termination condition could not be determined) we might still be able to vectorize should one of the other exits be countable. In such situations the loop is reflowed which enabled vectorization of many other loop forms. Concretely the kind of loops supported are of the forms: for (int i = 0; i < N; i++) { <statements1> if (<condition>) { ... <action>; } <statements2> } where <action> can be: - break - return - goto Any number of statements can be used before the <action> occurs. Since this is an initial version for GCC 14 it has the following limitations and features: - Only fixed sized iterations and buffers are supported. That is to say any vectors loaded or stored must be to statically allocated arrays with known sizes. N must also be known. This limitation is because our primary target for this optimization is SVE. For VLA SVE we can't easily do cross page iteraion checks. The result is likely to also not be beneficial. For that reason we punt support for variable buffers till we have First-Faulting support in GCC 15. - any stores in <statements1> should not be to the same objects as in <condition>. Loads are fine as long as they don't have the possibility to alias. More concretely, we block RAW dependencies when the intermediate value can't be separated fromt the store, or the store itself can't be moved. - Prologue peeling, alignment peelinig and loop versioning are supported. - Fully masked loops, unmasked loops and partially masked loops are supported - Any number of loop early exits are supported. - No support for epilogue vectorization. The only epilogue supported is the scalar final one. Peeling code supports it but the code motion code cannot find instructions to make the move in the epilog. - Early breaks are only supported for inner loop vectorization. With the help of IPA and LTO this still gets hit quite often. During bootstrap it hit rather frequently. Additionally TSVC s332, s481 and s482 all pass now since these are tests for support for early exit vectorization. This implementation does not support completely handling the early break inside the vector loop itself but instead supports adding checks such that if we know that we have to exit in the current iteration then we branch to scalar code to actually do the final VF iterations which handles all the code in <action>. For the scalar loop we know that whatever exit you take you have to perform at most VF iterations. For vector code we only case about the state of fully performed iteration and reset the scalar code to the (partially) remaining loop. That is to say, the first vector loop executes so long as the early exit isn't needed. Once the exit is taken, the scalar code will perform at most VF extra iterations. The exact number depending on peeling and iteration start and which exit was taken (natural or early). For this scalar loop, all early exits are treated the same. When we vectorize we move any statement not related to the early break itself and that would be incorrect to execute before the break (i.e. has side effects) to after the break. If this is not possible we decline to vectorize. The analysis and code motion also takes into account that it doesn't introduce a RAW dependency after the move of the stores. This means that we check at the start of iterations whether we are going to exit or not. During the analyis phase we check whether we are allowed to do this moving of statements. Also note that we only move the scalar statements, but only do so after peeling but just before we start transforming statements. With this the vector flow no longer necessarily needs to match that of the scalar code. In addition most of the infrastructure is in place to support general control flow safely, however we are punting this to GCC 15. Codegen: for e.g. unsigned vect_a[N]; unsigned vect_b[N]; unsigned test4(unsigned x) { unsigned ret = 0; for (int i = 0; i < N; i++) { vect_b[i] = x + i; if (vect_a[i] > x) break; vect_a[i] = x; } return ret; } We generate for Adv. SIMD: test4: adrp x2, .LC0 adrp x3, .LANCHOR0 dup v2.4s, w0 add x3, x3, :lo12:.LANCHOR0 movi v4.4s, 0x4 add x4, x3, 3216 ldr q1, [x2, #:lo12:.LC0] mov x1, 0 mov w2, 0 .p2align 3,,7 .L3: ldr q0, [x3, x1] add v3.4s, v1.4s, v2.4s add v1.4s, v1.4s, v4.4s cmhi v0.4s, v0.4s, v2.4s umaxp v0.4s, v0.4s, v0.4s fmov x5, d0 cbnz x5, .L6 add w2, w2, 1 str q3, [x1, x4] str q2, [x3, x1] add x1, x1, 16 cmp w2, 200 bne .L3 mov w7, 3 .L2: lsl w2, w2, 2 add x5, x3, 3216 add w6, w2, w0 sxtw x4, w2 ldr w1, [x3, x4, lsl 2] str w6, [x5, x4, lsl 2] cmp w0, w1 bcc .L4 add w1, w2, 1 str w0, [x3, x4, lsl 2] add w6, w1, w0 sxtw x1, w1 ldr w4, [x3, x1, lsl 2] str w6, [x5, x1, lsl 2] cmp w0, w4 bcc .L4 add w4, w2, 2 str w0, [x3, x1, lsl 2] sxtw x1, w4 add w6, w1, w0 ldr w4, [x3, x1, lsl 2] str w6, [x5, x1, lsl 2] cmp w0, w4 bcc .L4 str w0, [x3, x1, lsl 2] add w2, w2, 3 cmp w7, 3 beq .L4 sxtw x1, w2 add w2, w2, w0 ldr w4, [x3, x1, lsl 2] str w2, [x5, x1, lsl 2] cmp w0, w4 bcc .L4 str w0, [x3, x1, lsl 2] .L4: mov w0, 0 ret .p2align 2,,3 .L6: mov w7, 4 b .L2 and for SVE: test4: adrp x2, .LANCHOR0 add x2, x2, :lo12:.LANCHOR0 add x5, x2, 3216 mov x3, 0 mov w1, 0 cntw x4 mov z1.s, w0 index z0.s, #0, #1 ptrue p1.b, all ptrue p0.s, all .p2align 3,,7 .L3: ld1w z2.s, p1/z, [x2, x3, lsl 2] add z3.s, z0.s, z1.s cmplo p2.s, p0/z, z1.s, z2.s b.any .L2 st1w z3.s, p1, [x5, x3, lsl 2] add w1, w1, 1 st1w z1.s, p1, [x2, x3, lsl 2] add x3, x3, x4 incw z0.s cmp w3, 803 bls .L3 .L5: mov w0, 0 ret .p2align 2,,3 .L2: cntw x5 mul w1, w1, w5 cbz w5, .L5 sxtw x1, w1 sub w5, w5, #1 add x5, x5, x1 add x6, x2, 3216 b .L6 .p2align 2,,3 .L14: str w0, [x2, x1, lsl 2] cmp x1, x5 beq .L5 mov x1, x4 .L6: ldr w3, [x2, x1, lsl 2] add w4, w0, w1 str w4, [x6, x1, lsl 2] add x4, x1, 1 cmp w0, w3 bcs .L14 mov w0, 0 ret On the workloads this work is based on we see between 2-3x performance uplift using this patch. Follow up plan: - Boolean vectorization has several shortcomings. I've filed PR110223 with the bigger ones that cause vectorization to fail with this patch. - SLP support. This is planned for GCC 15 as for majority of the cases build SLP itself fails. This means I'll need to spend time in making this more robust first. Additionally it requires: * Adding support for vectorizing CFG (gconds) * Support for CFG to differ between vector and scalar loops. Both of which would be disruptive to the tree and I suspect I'll be handling fallouts from this patch for a while. So I plan to work on the surrounding building blocks first for the remainder of the year. Additionally it also contains reduced cases from issues found running over various codebases. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Also regtested with: -march=armv8.3-a+sve -march=armv8.3-a+nosve -march=armv9-a -mcpu=neoverse-v1 -mcpu=neoverse-n2 Bootstrapped Regtested x86_64-pc-linux-gnu and no issues. Bootstrap and Regtest on arm-none-linux-gnueabihf and no issues. gcc/ChangeLog: * tree-if-conv.cc (idx_within_array_bound): Expose. * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New. (vect_analyze_data_ref_dependences): Use it. * tree-vect-loop-manip.cc (vect_iv_increment_position): New. (vect_set_loop_controls_directly, vect_set_loop_condition_partial_vectors, vect_set_loop_condition_partial_vectors_avx512, vect_set_loop_condition_normal): Support multiple exits. (slpeel_tree_duplicate_loop_to_edge_cfg): Support LCSAA peeling for multiple exits. (slpeel_can_duplicate_loop_p): Change vectorizer from looking at BB count and instead look at loop shape. (vect_update_ivs_after_vectorizer): Drop asserts. (vect_gen_vector_loop_niters_mult_vf): Support peeled vector iterations. (vect_do_peeling): Support multiple exits. (vect_loop_versioning): Likewise. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialise early_breaks. (vect_analyze_loop_form): Support loop flows with more than single BB loop body. (vect_create_loop_vinfo): Support niters analysis for multiple exits. (vect_analyze_loop): Likewise. (vect_get_vect_def): New. (vect_create_epilog_for_reduction): Support early exit reductions. (vectorizable_live_operation_1): New. (find_connected_edge): New. (vectorizable_live_operation): Support early exit live operations. (move_early_exit_stmts): New. (vect_transform_loop): Use it. * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond. (vect_recog_bitfield_ref_pattern): Support gconds and bools. (vect_recog_gcond_pattern): New. (possible_vector_mask_operation_p): Support gcond masks. (vect_determine_mask_precision): Likewise. (vect_mark_pattern_stmts): Set gcond def type. (can_vectorize_live_stmts): Force early break inductions to be live. * tree-vect-stmts.cc (vect_stmt_relevant_p): Add relevancy analysis for early breaks. (vect_mark_stmts_to_be_vectorized): Process gcond usage. (perm_mask_for_reverse): Expose. (vectorizable_comparison_1): New. (vectorizable_early_exit): New. (vect_analyze_stmt): Support early break and gcond. (vect_transform_stmt): Likewise. (vect_is_simple_use): Likewise. (vect_get_vector_types_for_stmt): Likewise. * tree-vectorizer.cc (pass_vectorize::execute): Update exits for value numbering. * tree-vectorizer.h (enum vect_def_type): Add vect_condition_def. (LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_EARLY_BRK_STORES, LOOP_VINFO_EARLY_BREAKS_VECT_PEELED, LOOP_VINFO_EARLY_BRK_DEST_BB, LOOP_VINFO_EARLY_BRK_VUSES): New. (is_loop_header_bb_p): Drop assert. (class loop): Add early_breaks, early_break_stores, early_break_dest_bb, early_break_vuses. (vect_iv_increment_position, perm_mask_for_reverse, ref_within_array_bound): New. (slpeel_tree_duplicate_loop_to_edge_cfg): Update for early breaks.
This patch adjusts the costs so that we treat REG and SUBREG expressions the same for costing. This was motivated by bt_skip_func and bt_find_func in xz and results in nearly a 5% improvement in the dynamic instruction count for input #2 and smaller, but definitely visible improvements pretty much across the board. Exceptions would be perlbench input #1 and exchange2 which showed very small regressions. In the bt_find_func and bt_skip_func cases we have something like this: > (insn 10 7 11 2 (set (reg/v:DI 136 [ x ]) > (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip} > (nil)) > (insn 11 10 12 2 (set (reg:DI 142 [ _1 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3} > (nil)) [ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3} > (nil)) Note the two uses of (reg 136). The best way to handle that in combine might be a 3->2 split. But there's a much better approach if we look at fwprop... (set (reg:DI 142 [ _1 ]) (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0)) (reg/v:DI 139 [ b ]))) change not profitable (cost 4 -> cost 8) So that should be the same cost as a regular DImode addition when the ZBA extension is enabled. But it ends up costing more because the clause to cost this variant isn't prepared to handle a SUBREG. That results in the RTL above having too high a cost and fwprop gives up. One approach would be to replace the REG_P with REG_P || SUBREG_P in the costing code. I ultimately decided against that and instead check if the operand in question passes register_operand. By far the most important case to handle is the DImode PLUS. But for the sake of consistency, I changed the other instances in riscv_rtx_costs as well. For those other cases we're talking about improvements in the .000001% range. While we are into stage4, this just hits cost modeling which we've generally agreed is still appropriate (though we were mostly talking about vector). So I'm going to extend that general agreement ever so slightly and include scalar cost modeling :-) gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG similarly. gcc/testsuite/ * gcc.target/riscv/reg_subreg_costs.c: New test. Co-authored-by: Jivan Hakobyan <[email protected]>
We evaluate constexpr functions on the original, pre-genericization bodies. That means that the function body we're evaluating will not have gone through cp_genericize_r's "Map block scope extern declarations to visible declarations with the same name and type in outer scopes if any". Here: constexpr bool bar() { return true; } // #1 constexpr bool foo() { constexpr bool bar(void); // #2 return bar(); } it means that we: 1) register_constexpr_fundef (#1) 2) cp_genericize (#1) nothing interesting happens 3) register_constexpr_fundef (foo) does copy_fn, so we have two copies of the BIND_EXPR 4) cp_genericize (foo) this remaps #2 to #1, but only on one copy of the BIND_EXPR 5) retrieve_constexpr_fundef (foo) we find it, no problem 6) retrieve_constexpr_fundef (#2) and here #2 isn't found in constexpr_fundef_table, because we're working on the BIND_EXPR copy where #2 wasn't mapped to #1 so we fail. We've only registered #1. It should work to use DECL_LOCAL_DECL_ALIAS (which used to be extern_decl_map). We evaluate constexpr functions on pre-cp_fold bodies to avoid diagnostic problems, but the remapping I'm proposing should not interfere with diagnostics. This is not a problem for a global scope redeclaration; there we go through duplicate_decls which keeps the DECL_UID: DECL_UID (olddecl) = olddecl_uid; and DECL_UID is what constexpr_fundef_hasher::hash uses. PR c++/111132 gcc/cp/ChangeLog: * constexpr.cc (get_function_named_in_call): Use cp_get_fndecl_from_callee. * cvt.cc (cp_get_fndecl_from_callee): If there's a DECL_LOCAL_DECL_ALIAS, use it. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-redeclaration3.C: New test. * g++.dg/cpp0x/constexpr-redeclaration4.C: New test.
aarch64-sve.md had a pattern that combined: cmpeq pb.T, pa/z, zc.T, #0 mov zd.T, pb/z, #1 into: cnot zd.T, pa/m, zc.T But this is only valid if pa.T is a ptrue. In other cases, the original would set inactive elements of zd.T to 0, whereas the combined form would copy elements from zc.T. gcc/ PR target/114603 * config/aarch64/aarch64-sve.md (@aarch64_pred_cnot<mode>): Replace with... (@aarch64_ptrue_cnot<mode>): ...this, requiring operand 1 to be a ptrue. (*cnot<mode>): Require operand 1 to be a ptrue. * config/aarch64/aarch64-sve-builtins-base.cc (svcnot_impl::expand): Use aarch64_ptrue_cnot<mode> for _x operations that are predicated with a ptrue. Represent other _x operations as fully-defined _m operations. gcc/testsuite/ PR target/114603 * gcc.target/aarch64/sve/acle/general/cnot_1.c: New test.
…. [PR114741] In PR114741 we see that we have a regression in codegen when SVE is enable where the simple testcase: void foo(unsigned v, unsigned *p) { *p = v & 1; } generates foo: fmov s31, w0 and z31.s, z31.s, #1 str s31, [x1] ret instead of: foo: and w0, w0, 1 str w0, [x1] ret This causes an impact it not just codesize but also performance. This is caused by the use of the ^ constraint modifier in the pattern <optab><mode>3. The documentation states that this modifier should only have an effect on the alternative costing in that a particular alternative is to be preferred unless a non-psuedo reload is needed. The pattern was trying to convey that whenever both r and w are required, that it should prefer r unless a reload is needed. This is because if a reload is needed then we can construct the constants more flexibly on the SIMD side. We were using this so simplify the implementation and to get generic cases such as: double negabs (double x) { unsigned long long y; memcpy (&y, &x, sizeof(double)); y = y | (1UL << 63); memcpy (&x, &y, sizeof(double)); return x; } which don't go through an expander. However the implementation of ^ in the register allocator is not according to the documentation in that it also has an effect during coloring. During initial register class selection it applies a penalty to a class, similar to how ? does. In this example the penalty makes the use of GP regs expensive enough that it no longer considers them: r106: preferred FP_REGS, alternative NO_REGS, allocno FP_REGS ;; 3--> b 0: i 9 r106=r105&0x1 :cortex_a53_slot_any:GENERAL_REGS+0(-1)FP_REGS+1(1)PR_LO_REGS+0(0) PR_HI_REGS+0(0):model 4 which is not the expected behavior. For GCC 14 this is a conservative fix. 1. we remove the ^ modifier from the logical optabs. 2. In order not to regress copysign we then move the copysign expansion to directly use the SIMD variant. Since copysign only supports floating point modes this is fine and no longer relies on the register allocator to select the right alternative. It once again regresses the general case, but this case wasn't optimized in earlier GCCs either so it's not a regression in GCC 14. This change gives strict better codegen than earlier GCCs and still optimizes the important cases. gcc/ChangeLog: PR target/114741 * config/aarch64/aarch64.md (<optab><mode>3): Remove ^ from alt 2. (copysign<GPF:mode>3): Use SIMD version of IOR directly. gcc/testsuite/ChangeLog: PR target/114741 * gcc.target/aarch64/fneg-abs_2.c: Update codegen. * gcc.target/aarch64/fneg-abs_4.c: xfail for now. * gcc.target/aarch64/pr114741.c: New test.
Hi,
Is there a chance to see an update on the GCC Rust frontend? I’m uneasy seeing Rust tied solely to LLVM.
Best wishes,
Arne
The text was updated successfully, but these errors were encountered: