-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: kafka source performance regression due to tracing #12959
Comments
cc @st1page |
q0 So we need to check the CPU perf flame graph to find out which part consumes the CPU |
This comment was marked as resolved.
This comment was marked as resolved.
In summary, the RisingWave runtime CPU percentage decreases(68.58%->62.54%) and the RDKafka runtime CPU percentage increases(14.03%->17.04%) 🤔 |
This comment was marked as resolved.
This comment was marked as resolved.
No, they are balanced 🤔 nightly-1009's grafana https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&from=1696869340000&to=1696870362000&var-namespace=nexmark-bs-0-14-affinity-daily-20231009 |
The flamegraph can be downloaded or directly examined in the Buildkite URL, under the |
This comment was marked as resolved.
This comment was marked as resolved.
Just confirmed that the second drop is caused by rdkafka bump might take a look at https://github.com/confluentinc/librdkafka/blob/master/CHANGELOG.md 387d251's performance is better
Why upgrading rdkafka lowers CPU usage, and also lowers throughput? |
First regression is #12659 cc @BugenZhao the 2 points are before/after #12659 cpu |
But why the flamegraph shows that |
I tried to revert #12659 on latest main (https://github.com/risingwavelabs/risingwave/tree/xxchan/revert-12659), in order to see what's the rdkafka's impact on its own, but it seems this also cancelled the impact of rdkafka bump. 🤡 |
I cannot observe anything useful from the flame graphs provided above, perhaps the files for 10/08 and 10/09 are incorrect? However, in #13073 I can clearly find risingwave/src/connector/src/parser/mod.rs Lines 540 to 544 in 1ca0a80
Why it takes so much CPU time: The https://docs.rs/tracing-subscriber/0.3.17/src/tracing_subscriber/fmt/fmt_layer.rs.html#797-812 |
So what about #11232? |
According to
I'm not sure whether we need to investigate rdkafka further. 🤡 Let's try to fix tracing first and see. |
To verify if it is the environment's impact, we re-run a
nightly-20231001
image as the last data point, its performance is good.The first drop happens between
nightly-20231008
andnightly-20231009
, the second drop happens betweennightly-20231012
andnightly-20231015
commits between 1008 and 1009
We notice that q0 is a stateless query that does no computation and no state access, so likely to be some issue related to
parsing
I guess?The text was updated successfully, but these errors were encountered: