Skip to content

Commit

Permalink
Tokio console and log filter reloading (#2868)
Browse files Browse the repository at this point in the history
# Description
#2815 reverted 2 changes
(#2791,
#2792) because we again saw
issues with logs that were suddenly no longer emitted.
This problem also happened the last time we merged support for
`tokio-console` but I hoped that:
1. the issue was resolved in one of the related dependencies somehow
2. making the feature opt-in with a CLI flag would allow normal
operations when disabled

However, even with the feature disabled we saw that logs were again
missing. I tried to debug the underlying issue and it appears that for
some reason a log like `::tracing::event!(target: "sqlx::query",
tracing::Level::DEBUG, summary);` would not get logged with the original
implementation (not a huge deal since we weren't using them) but these
logs would also cause the NEXT log that would normally be emitted to be
skipped.
This can be seen in this
[code](5ba4666)
which reproduced the error.

Unfortunately `sqlx` is using `tracing::event!()` when executing an SQL
query which means with the described problem it could happen randomly
that some logs are missing. Luckily there are also scenarios where some
SQL query always gets executed before a log we issue ourselves (e.g.
`saving fee policies` log in the autopilot run loop would always be
skipped) which made it easier to find the underlying cause.

# Changes
1. Add `tokio-console` and `filter reloading` feature again
2. Fiddled around with how the tracing subscriber gets initialized until
the test that reproduced the issue no longer skipped any logs. This was
not straight forward since initializing the tokio console and our
regular log filters requires you to jump through a few type system
hoops.
3. removed CLI flag since this did not have the originally intended
effect anyway

# How to review
The only important change happend in
41c47ae.
The other stuff is just the revert commit.


## How to test
Run test that reproduced the issue and check that the targeted log does
not get skipped.
This doesn't necessarily mean that all is well now. To get more
confidence on the change I would just run the CI and compare the number
of logs with CI runs that don't change the tracing setup. If both have
roughly the same number of logs (I assume some variance will always be
there) the change should be good.
  • Loading branch information
MartinquaXD authored Aug 8, 2024
1 parent 76b843c commit 95ecc4e
Show file tree
Hide file tree
Showing 10 changed files with 363 additions and 21 deletions.
2 changes: 2 additions & 0 deletions .cargo/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[build]
rustflags = ["--cfg", "tokio_unstable"]
147 changes: 147 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ tempfile = "3.10.1"
time = { version = "0.3.36", features = ["macros"] }
thiserror = "1.0.61"
toml = "0.8.14"
tokio = "1.38.0"
tokio = { version = "1.38.0", features = ["tracing"] }
tokio-stream = { version = "0.1.15", features = ["sync"] }
tracing = "0.1.40"
tracing-subscriber = "0.3.18"
Expand Down
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,27 @@ ANVIL_IP_ADDR=0.0.0.0 anvil \
--timestamp 1577836800
```

### Profiling

The most important binaries support [tokio-console](https://github.com/tokio-rs/console) to allow you a could look inside the tokio runtime.

Simply enable the feature by passing `--enable-tokio-console true` when running a binary and then in another shell, run

```
cargo install --locked tokio-console
tokio-console
```


### Changing Log Filters

It's possible to change the tracing log filter while the process is running. This can be useful to debug an error that requires more verbose logs but which might no longer appear after restarting the system.

Each process opens a UNIX socket at `/tmp/log_filter_override_<program_name>_<pid>.sock`. To change the log filter connect to it with `nc -U <path>` and enter a new log filter.
You can also reset the log filter to the filter the program was initially started with by entering `reset`.

See [here](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html#directives) for documentation on the supported log filter format.

## Running the Services Locally

### Prerequisites
Expand Down
3 changes: 2 additions & 1 deletion crates/observe/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ edition = "2021"
license = "MIT OR Apache-2.0"

[dependencies]
atty = "0.2.14"
atty = "0.2"
console-subscriber = "0.3.0"
futures = { workspace = true }
once_cell = { workspace = true }
pin-project-lite = "0.2.14"
Expand Down
4 changes: 4 additions & 0 deletions crates/observe/build.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
fn main() {
// Make build system aware of custom config flags to avoid clippy warnings
println!("cargo::rustc-check-cfg=cfg(tokio_unstable)");
}
3 changes: 3 additions & 0 deletions crates/observe/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ pub mod metrics;
pub mod panic_hook;
pub mod request_id;
pub mod tracing;

#[cfg(unix)]
mod tracing_reload_handler;
86 changes: 69 additions & 17 deletions crates/observe/src/tracing.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
use {
crate::tracing_reload_handler::spawn_reload_handler,
std::{panic::PanicInfo, sync::Once},
time::macros::format_description,
tracing::level_filters::LevelFilter,
tracing_subscriber::fmt::{time::UtcTime, writer::MakeWriterExt as _},
tracing_subscriber::{
fmt::{time::UtcTime, writer::MakeWriterExt as _},
prelude::*,
util::SubscriberInitExt,
EnvFilter,
Layer,
},
};

/// Initializes tracing setup that is shared between the binaries.
Expand All @@ -28,22 +35,67 @@ pub fn initialize_reentrant(env_filter: &str) {
}

fn set_tracing_subscriber(env_filter: &str, stderr_threshold: LevelFilter) {
// This is what kibana uses to separate multi line log messages.
let subscriber_builder = tracing_subscriber::fmt::fmt()
.with_timer(UtcTime::new(format_description!(
"[year]-[month]-[day]T[hour]:[minute]:[second].[subsecond digits:3]Z"
)))
.with_env_filter(env_filter)
.with_ansi(atty::is(atty::Stream::Stdout));
match stderr_threshold.into_level() {
Some(threshold) => subscriber_builder
.with_writer(
std::io::stderr
.with_max_level(threshold)
.or_else(std::io::stdout),
)
.init(),
None => subscriber_builder.init(),
let initial_filter = env_filter.to_string();

// The `tracing` APIs are heavily generic to enable zero overhead. Unfortunately
// this leads to very annoying type constraints which can only be satisfied
// by literally copy and pasting the code so the compiler doesn't try to
// infer types that satisfy both the tokio-console and the regular case.
// It's tempting to resolve this mess by first configuring the `fmt_layer` and
// only then the `console_subscriber`. However, this setup was the only way
// I found that:
// 1. actually makes `tokio-console` work
// 2. prints logs if `tokio-console` is disabled
// 3. does NOT skip the next log following a `tracing::event!()`. These calls
// happen for example under the hood in `sqlx`. I don't understand what's
// actually causing that but at this point I'm just happy if all the features
// work correctly.
macro_rules! fmt_layer {
($env_filter:expr, $stderr_threshold:expr) => {{
tracing_subscriber::fmt::layer()
.with_writer(
std::io::stdout
.with_min_level(
$stderr_threshold
.into_level()
.unwrap_or(tracing::Level::ERROR),
)
.or_else(std::io::stderr),
)
.with_timer(UtcTime::new(format_description!(
"[year]-[month]-[day]T[hour]:[minute]:[second].[subsecond digits:3]Z"
)))
.with_ansi(atty::is(atty::Stream::Stdout))
.with_filter($env_filter)
}};
}

if cfg!(tokio_unstable) {
let (env_filter, reload_handle) =
tracing_subscriber::reload::Layer::new(EnvFilter::new(&initial_filter));

tracing_subscriber::registry()
.with(console_subscriber::spawn())
.with(fmt_layer!(env_filter, stderr_threshold))
.init();

if cfg!(unix) {
spawn_reload_handler(initial_filter, reload_handle);
}
} else {
let (env_filter, reload_handle) =
tracing_subscriber::reload::Layer::new(EnvFilter::new(&initial_filter));

tracing_subscriber::registry()
// Without this the subscriber ignores the next log after an `tracing::event!()` which
// `sqlx` uses under the hood.
.with(tracing::level_filters::LevelFilter::TRACE)
.with(fmt_layer!(env_filter, stderr_threshold))
.init();

if cfg!(unix) {
spawn_reload_handler(initial_filter, reload_handle);
}
}
}

Expand Down
Loading

0 comments on commit 95ecc4e

Please sign in to comment.