-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump tracing crate dependency #75391
Conversation
Testing for potential speed improvements as per #74726 (comment) @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit b466292 with merge b7c1f382b4b695dcd40edc5432ef9b9c6db0dba8... |
☀️ Try build successful - checks-actions, checks-azure |
Queued b7c1f382b4b695dcd40edc5432ef9b9c6db0dba8 with parent 441fd22, future comparison URL. |
Finished benchmarking try commit (b7c1f382b4b695dcd40edc5432ef9b9c6db0dba8): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
huh... this regressed rustdoc? It seems to have had no effect on any of the other tests |
cc @rust-lang/rustdoc are rustdoc tests on the perfbot reliable? I've never seen them before |
They're new (got added a month ago iirc), and I don't know how reliable they are cc @jyn514 |
that's pretty strange! i have no idea if 1% is meaningful in these tests though.
i'm assuming these are run with a release build, which statically disables all tracing, right? i didn't expect a performance improvement in that case --- i expected it in the case where tracing was not statically disabled. the hope, specifically, was that performance would be improved enough that it wouldn't be necessary to disable tracing at compile time, so that it can be enabled in release builds just by setting |
Rustdoc perf is extremely variable so I wouldn't block on a 1% regression. In particular until we stop loading all extern crates on startup I don't think it makes sense to have the rest of the compiler be more careful than we are ourselves: https://github.com/rust-lang/rust/blob/master/src/librustdoc/core.rs#L423 |
oh heh, oops @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 7d63e11 with merge dfcb3809f32e44b06f33c0781af9bf5bb42b20a3... |
☀️ Try build successful - checks-actions, checks-azure |
Queued dfcb3809f32e44b06f33c0781af9bf5bb42b20a3 with parent 840dbe7, future comparison URL. |
Finished benchmarking try commit (dfcb3809f32e44b06f33c0781af9bf5bb42b20a3): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@Mark-Simulacrum we're losing up to 10% on perf if we enable @hawkw since one of the things we do if no |
trying with just allowing @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 9697841 with merge 05ccae7f970661c4e9dd14263a7232a881077a8f... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 05ccae7f970661c4e9dd14263a7232a881077a8f with parent 4745cbe, future comparison URL. |
Is it possible to do this for |
Having |
Finished benchmarking try commit (05ccae7f970661c4e9dd14263a7232a881077a8f): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Even |
Yea I don't think we can do anything here without really digging into all the sites where logging is used |
Here is a cachegrind diff on wf-projection-stress Check/Full between the debug! try and the parent. My guess is that the extra load is causing a vectorization or similar failure
|
The trace diff looks basically identical FWIW, all of the cost seems to be in the debug! calls here at least for wf-projection-stress. |
Could those debug! calls be turned into |
Yeah, so this is the old vs. new disassembly (presumably also the old one is inlined all over the place, but still present in un-inlined form) for the type folding: 000000000325f230 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty>:
325f230: mov eax,DWORD PTR [rsi+0x18]
325f233: test DWORD PTR [rdi],eax
325f235: setne al
325f238: ret 00000000035dfe30 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty>:
35dfe30: push rbp
35dfe31: push r15
35dfe33: push r14
35dfe35: push r13
35dfe37: push r12
35dfe39: push rbx
35dfe3a: sub rsp,0xf8
35dfe41: mov rbp,rdi
35dfe44: mov QWORD PTR [rsp+0x8],rsi
35dfe49: mov rax,QWORD PTR [rip+0x14ffeb8] # 4adfd08 <tracing_core::metadata::MAX_LEVEL@@Base-0x1d770>
35dfe50: mov rax,QWORD PTR [rax]
35dfe53: add rax,0xfffffffffffffffe
35dfe57: cmp rax,0x4
35dfe5b: /-------------- jb 35e003f <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x20f>
35dfe61: | lea rbx,[rip+0x151bae8] # 4afb950 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE>
35dfe68: | mov QWORD PTR [rsp+0xa0],rbx
35dfe70: | mov rax,QWORD PTR [rip+0x151bae9] # 4afb960 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE+0x10>
35dfe77: | cmp rax,0x3
35dfe7b: | /-- jne 35dfea4 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x74>
35dfe7d: | | mov QWORD PTR [rsp+0x10],rbx
35dfe82: | | mov rax,QWORD PTR [rip+0x151bac7] # 4afb950 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE>
35dfe89: | | test rax,0xfffffffffffffffd
35dfe8f: | /--------|-- je 35dfeea <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0xba>
35dfe91: | | /--|-> lea rdi,[rsp+0x10]
35dfe96: | | | | call 33a9060 <tracing_core::dispatcher::get_default>
35dfe9b: | | | | test al,al
35dfe9d: | | /--|--|-- jne 35dfef4 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0xc4>
35dfe9f: +--|--|--|--|-- jmp 35e003f <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x20f>
35dfea4: | | | | \-> lea rax,[rsp+0xa0]
35dfeac: | | | | mov QWORD PTR [rsp+0x70],rax
35dfeb1: | | | | lea rax,[rsp+0x70]
35dfeb6: | | | | mov QWORD PTR [rsp+0x10],rax
35dfebb: | | | | lea rdi,[rip+0x151ba9e] # 4afb960 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE+0x10>
35dfec2: | | | | lea rcx,[rip+0x1482d9f] # 4a62c68 <anon.265d2ff7bf1138c8a9d5475f9b93bdc0.107.llvm.161654895804039685>
35dfec9: | | | | lea rdx,[rsp+0x10]
35dfece: | | | | xor esi,esi
35dfed0: | | | | call QWORD PTR [rip+0x14fec42] # 4adeb18 <std::sync::once::Once::call_inner>
35dfed6: | | | | mov QWORD PTR [rsp+0x10],rbx
35dfedb: | | | | mov rax,QWORD PTR [rip+0x151ba6e] # 4afb950 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE>
35dfee2: | | | | test rax,0xfffffffffffffffd
35dfee8: | | | \----- jne 35dfe91 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x61>
35dfeea: | \--|-------> cmp rax,0x2
35dfeee: +-----|-------- jne 35e003f <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x20f>
35dfef4: | \-------> mov r14,QWORD PTR [rip+0x151ba5d] # 4afb958 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE+0x8>
35dfefb: | mov rbx,QWORD PTR [rip+0x15030ce] # 4ae2fd0 <tracing_core::metadata::Metadata::fields@@Base+0x1129fa0>
35dff02: | mov rdi,r14
35dff05: | call rbx
35dff07: | lea r13,[rsp+0xa0]
35dff0f: | mov rdi,r13
35dff12: | mov rsi,rax
35dff15: | call QWORD PTR [rip+0x15060e5] # 4ae6000 <tracing_core::field::FieldSet::iter@@Base+0x112d6d0>
35dff1b: | mov rdi,r14
35dff1e: | call rbx
35dff20: | mov r15,rax
35dff23: | lea r12,[rsp+0x10]
35dff28: | mov rdi,r12
35dff2b: | mov rsi,r13
35dff2e: | call QWORD PTR [rip+0x1506e6c] # 4ae6da0 <<tracing_core::field::Iter as core::iter::traits::iterator::Iterator>::next@@Base+0x112e340>
35dff34: | cmp QWORD PTR [rsp+0x18],0x0
35dff3a: | /-- je 35e005f <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x22f>
35dff40: | | mov rax,QWORD PTR [rsp+0x30]
35dff45: | | mov QWORD PTR [rsp+0xf0],rax
35dff4d: | | movups xmm0,XMMWORD PTR [rsp+0x10]
35dff52: | | movups xmm1,XMMWORD PTR [rsp+0x20]
35dff57: | | movaps XMMWORD PTR [rsp+0xe0],xmm1
35dff5f: | | movaps XMMWORD PTR [rsp+0xd0],xmm0
35dff67: | | mov rax,QWORD PTR [rsp+0x8]
35dff6c: | | add rax,0x18
35dff70: | | lea rcx,[rsp+0x8]
35dff75: | | mov QWORD PTR [rsp+0x10],rcx
35dff7a: | | mov rcx,QWORD PTR [rip+0x14fe0cf] # 4ade050 <rustc_middle::ty::structural_impls::<impl core::fmt::Debug for &rustc_middle::ty::TyS>::fmt@@Base+0x162b9a0>
35dff81: | | mov QWORD PTR [rsp+0x18],rcx
35dff86: | | mov QWORD PTR [rsp+0x20],rax
35dff8b: | | mov rax,QWORD PTR [rip+0x14ff136] # 4adf0c8 <<rustc_middle::ty::TypeFlags as core::fmt::Debug>::fmt@@Base+0x15ebd28>
35dff92: | | mov QWORD PTR [rsp+0x28],rax
35dff97: | | mov QWORD PTR [rsp+0x30],rbp
35dff9c: | | mov QWORD PTR [rsp+0x38],rax
35dffa1: | | lea rax,[rip+0x14c1120] # 4aa10c8 <anon.5586f39a61421f7cfcd1ecd8d73ab9b6.1061.llvm.5764158716530063693+0x78>
35dffa8: | | mov QWORD PTR [rsp+0x70],rax
35dffad: | | mov QWORD PTR [rsp+0x78],0x3
35dffb6: | | mov QWORD PTR [rsp+0x80],0x0
35dffc2: | | mov QWORD PTR [rsp+0x90],r12
35dffca: | | mov QWORD PTR [rsp+0x98],0x3
35dffd6: | | lea rax,[rsp+0xd0]
35dffde: | | mov QWORD PTR [rsp+0x58],rax
35dffe3: | | lea rax,[rsp+0x70]
35dffe8: | | mov QWORD PTR [rsp+0x60],rax
35dffed: | | lea rax,[rip+0x14bf86c] # 4a9f860 <anon.5586f39a61421f7cfcd1ecd8d73ab9b6.741.llvm.5764158716530063693+0xf8>
35dfff4: | | mov QWORD PTR [rsp+0x68],rax
35dfff9: | | lea rax,[rsp+0x58]
35dfffe: | | mov QWORD PTR [rsp+0x40],rax
35e0003: | | mov QWORD PTR [rsp+0x48],0x1
35e000c: | | mov QWORD PTR [rsp+0x50],r15
35e0011: | | lea rax,[rsp+0x40]
35e0016: | | mov QWORD PTR [rsp+0xa0],rax
35e001e: | | mov QWORD PTR [rsp+0xa8],r14
35e0026: | | mov QWORD PTR [rsp+0xb0],0x1
35e0032: | | lea rdi,[rsp+0xa0]
35e003a: | | call 33a9200 <tracing_core::dispatcher::get_default>
35e003f: \-----------|-> mov rax,QWORD PTR [rsp+0x8]
35e0044: | mov eax,DWORD PTR [rax+0x18]
35e0047: | test DWORD PTR [rbp+0x0],eax
35e004a: | setne al
35e004d: | add rsp,0xf8
35e0054: | pop rbx
35e0055: | pop r12
35e0057: | pop r13
35e0059: | pop r14
35e005b: | pop r15
35e005d: | pop rbp
35e005e: | ret
35e005f: \-> lea rdi,[rip+0x602226] # 3be228c <str.0+0x13c>
35e0066: lea rdx,[rip+0x14c1043] # 4aa10b0 <anon.5586f39a61421f7cfcd1ecd8d73ab9b6.1061.llvm.5764158716530063693+0x60>
35e006d: mov esi,0x22
35e0072: call QWORD PTR [rip+0x150a790] # 4aea808 <core::option::expect_failed>
35e0078: ud2 |
@Mark-Simulacrum oof, that is a pretty hairy disassembly. I suspect there are additional upstream changes we can make as a follow-on from tokio-rs/tracing#869 to simplify the per-callsite generated code some more, which should hopefully make it a little friendlier to rustc optimization. As a side note, what disassembler are you using? That output is quite nice :) |
Just regular objdump (GNU binutils version 2.34), with --visualize-jumps --no-show-raw-insn. |
Ah, |
r? @ghost
cc @hawkw