Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: throughput of stateless query with no computation drops 20% over time #14815

Closed
lmatz opened this issue Jan 26, 2024 · 15 comments
Closed
Assignees
Labels
found-by-nexmark-perf-test help wanted Issues that need help from contributors type/perf
Milestone

Comments

@lmatz
Copy link
Contributor

lmatz commented Jan 26, 2024

@github-actions github-actions bot added this to the release-1.7 milestone Jan 26, 2024
@StrikeW
Copy link
Contributor

StrikeW commented Jan 26, 2024

Is it run against the nightly image? May related to this PR #14524

@lmatz
Copy link
Contributor Author

lmatz commented Jan 26, 2024

Is it run against the nightly image? May related to this PR #14524

This one seems to be fine by looking at the recent two data points.
(although we cannot determine whether #14524 will not become a bottleneck if other aspects improve to what they used to be)

the drop of throughput mostly happens a while ago

@lmatz
Copy link
Contributor Author

lmatz commented Feb 3, 2024

Nexmark q0, q3, q7-rewrite, q13 are high-throughput queries, i.e. >= 900K/s, that are limited by this performance degradation. cc: @fuyufjh
They used to be better than the same queries run on other systems.

@xxchan
Copy link
Member

xxchan commented Feb 6, 2024

From the commit history of some spikes, I couldn't find any useful insights. 😟 It's also mysterious that after 0106 it's quite smooth.

@lmatz
Copy link
Contributor Author

lmatz commented Feb 7, 2024

@lmatz lmatz added the help wanted Issues that need help from contributors label Feb 8, 2024
@tabVersion
Copy link
Contributor

may relate to https://github.com/risingwavelabs/risingwave/pull/13707/files#r1495536048
it changes parse path to unify parsing the key part for all connectors. 🤔

@lmatz
Copy link
Contributor Author

lmatz commented Feb 21, 2024

FYI:
The flame graph comes from the weekly generation pipeline:
https://buildkite.com/risingwavelabs/cpu-flamegraph-weekly-cron

@lmatz
Copy link
Contributor Author

lmatz commented Feb 21, 2024

also, execute q0 with an old image nightly-20230927 (which reach 1.27M/s on 2023-09-27) three times to rule out the possible impact from the testing environment

As we can see from http://metabase.risingwave-cloud.xyz/question/36-nexmark-q0-blackhole-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-169?start_date=2023-08-28

SCR-20240221-nad

the latest three execution of nightly-20230927 can stably achieve 1.1M/s

so we can conclude that the gap between the current nightly 970K/s and 1.1M/s is affected by changes in kernel.

but the gap between 1.1M/s and 1.27M/ still needs to be investigated.

@tabVersion
Copy link
Contributor

may relate to #13707 (files) it changes parse path to unify parsing the key part for all connectors. 🤔

a more detailed explanation

We use a dedicated json parser for format plain encode json before #13707

(ProtocolProperties::Plain, EncodingProperties::Json(_)) => {
JsonParser::new(parser_config.specific, rw_columns, source_ctx).map(Self::Json)
}

It serves as a shortcut as it builds a json parser directly, which does not need to impl the access trait.
pub async fn parse_inner(
&self,
mut payload: Vec<u8>,
mut writer: SourceStreamChunkRowWriter<'_>,
) -> ConnectorResult<()> {
let value = simd_json::to_borrowed_value(&mut payload[self.payload_start_idx..])
.context("failed to parse json payload")?;
let values = if let simd_json::BorrowedValue::Array(arr) = value {
Either::Left(arr.into_iter())
} else {
Either::Right(std::iter::once(value))
};
let mut errors = Vec::new();
for value in values {
let accessor = JsonAccess::new(value);
match apply_row_accessor_on_stream_chunk_writer(accessor, &mut writer) {
Ok(_) => {}
Err(err) => errors.push(err),
}
}
if errors.is_empty() {
Ok(())
} else {
// TODO(error-handling): multiple errors
bail!(
"failed to parse {} row(s) in a single json message: {}",
errors.len(),
errors.iter().format(", ")
);
}
}


Other parsers, behave like the following

(ProtocolProperties::Plain, _) => {
let parser =
PlainParser::new(parser_config.specific, rw_columns, source_ctx).await?;
Ok(Self::Plain(parser))
}

which requires concat a [&str] to find the right path to each field, leading to some serialization overhead
fn access_field(&self, name: &str, type_expected: &DataType) -> super::AccessResult {
// access value firstly
match self.access(&["value", name], Some(type_expected)) {
Err(AccessError::Undefined { .. }) => (), // fallthrough
other => return other,
};
match self.access(&["key", name], Some(type_expected)) {
Err(AccessError::Undefined { .. }) => (), // fallthrough
other => return other,
};
if let Some(key_as_column_name) = &self.key_as_column_name
&& name == key_as_column_name
{
return self.access(&["key"], Some(type_expected));
}
Ok(None)
}

after #13707, I deprecate the dedicated plain json parser and unified all format plain parts, resulting in the pref issue.

@fuyufjh
Copy link
Member

fuyufjh commented Feb 29, 2024

I see. Agree that migrating from JsonParser to PlainParser is necessary for better code structure.

I am investigating the performance gap between JsonParser (faster) and PlainParser (slower). Interestingly, both JsonParser and PlainParser calls JsonAccess once to parse the payload.

let accessor = JsonAccess::new(value);
match apply_row_accessor_on_stream_chunk_writer(accessor, &mut writer) {

row_op = row_op.with_value(self.payload_builder.generate_accessor(data).await?);

Thus, the only explanation is that PlainParser has additional overhead beyond parsing JSON. I think we need to conduct some micro-benchmarks and CPU profiling to investigate this issue.

@lmatz
Copy link
Contributor Author

lmatz commented Apr 29, 2024

Please check the micro-benchmark results in #16526

and slack threads (because SVG will be translated into PNG by Github automatically):
https://risingwave-labs.slack.com/archives/C03CPDQCNE4/p1708405575586669

In short, we compare the latest implementation of json_parser and plain_parser (a wrapper of json_parser) to show that there is a ~10% performance gap.

If there are no changes to json_parser itself over this period of time, then the performance gap in parser can only explain half of the 20% performance drop in total.

@lmatz
Copy link
Contributor Author

lmatz commented May 19, 2024

#16526 (comment)

Is it possible to refactor the code path into a chunk-based one?

We have a delicate vectorized execution engine to amortize constant overhead but the source connector/parser is not part of it.

@lmatz
Copy link
Contributor Author

lmatz commented Jun 17, 2024

link #17196 and #12959

@fuyufjh fuyufjh assigned BugenZhao and xxchan and unassigned fuyufjh and tabVersion Jul 10, 2024
@fuyufjh
Copy link
Member

fuyufjh commented Jul 10, 2024

link #17196 and #12959

Yes, subsequent investigation was handed over to @xxchan @BugenZhao. Any updates?

@fuyufjh fuyufjh mentioned this issue Aug 12, 2024
9 tasks
@fuyufjh
Copy link
Member

fuyufjh commented Aug 29, 2024

@fuyufjh fuyufjh closed this as completed Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
found-by-nexmark-perf-test help wanted Issues that need help from contributors type/perf
Projects
None yet
Development

No branches or pull requests

6 participants