perf: throughput of stateless query with no computation drops 20% over time #14815

lmatz · 2024-01-26T06:51:42Z

Nexmark Q0 does 0 computation: https://github.com/risingwavelabs/kube-bench/blob/main/manifests/nexmark/nexmark-sinks.template.yaml#L14-L18

http://metabase.risingwave-cloud.xyz/question/36-nexmark-q0-blackhole-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-169?start_date=2023-08-28

Based on the CPU flame graph generated weekly, there seems to be significant refactoring in the source/parser:
https://buildkite.com/risingwavelabs/cpu-flamegraph-weekly-cron/builds?branch=main

StrikeW · 2024-01-26T07:01:00Z

Is it run against the nightly image? May related to this PR #14524

lmatz · 2024-01-26T07:17:51Z

Is it run against the nightly image? May related to this PR #14524

This one seems to be fine by looking at the recent two data points.
(although we cannot determine whether #14524 will not become a bottleneck if other aspects improve to what they used to be)

the drop of throughput mostly happens a while ago

lmatz · 2024-02-03T08:41:20Z

Nexmark q0, q3, q7-rewrite, q13 are high-throughput queries, i.e. >= 900K/s, that are limited by this performance degradation. cc: @fuyufjh
They used to be better than the same queries run on other systems.

xxchan · 2024-02-06T15:11:23Z

From the commit history of some spikes, I couldn't find any useful insights. 😟 It's also mysterious that after 0106 it's quite smooth.

lmatz · 2024-02-07T04:55:56Z

2024/02/04:
https://buildkite.com/risingwavelabs/cpu-flamegraph-weekly-cron/builds/64#018d75b8-769f-4b82-8de8-8ddd00c8ef70

2024/12/12:
https://buildkite.com/risingwavelabs/cpu-flamegraph-weekly-cron/builds/56#018c5cde-51bb-4c01-8ab9-df855978ec61

access_field was not taking such a large portion?

There was no generate_accessor function.

tabVersion · 2024-02-20T09:55:44Z

may relate to https://github.com/risingwavelabs/risingwave/pull/13707/files#r1495536048
it changes parse path to unify parsing the key part for all connectors. 🤔

lmatz · 2024-02-21T07:35:53Z

FYI:
The flame graph comes from the weekly generation pipeline:
https://buildkite.com/risingwavelabs/cpu-flamegraph-weekly-cron

lmatz · 2024-02-21T08:47:54Z

also, execute q0 with an old image nightly-20230927 (which reach 1.27M/s on 2023-09-27) three times to rule out the possible impact from the testing environment

As we can see from http://metabase.risingwave-cloud.xyz/question/36-nexmark-q0-blackhole-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-169?start_date=2023-08-28

the latest three execution of nightly-20230927 can stably achieve 1.1M/s

so we can conclude that the gap between the current nightly 970K/s and 1.1M/s is affected by changes in kernel.

but the gap between 1.1M/s and 1.27M/ still needs to be investigated.

tabVersion · 2024-02-28T09:31:30Z

may relate to #13707 (files) it changes parse path to unify parsing the key part for all connectors. 🤔

a more detailed explanation

We use a dedicated json parser for format plain encode json before #13707

risingwave/src/connector/src/parser/mod.rs

Lines 757 to 759 in beabcb2

    
           (ProtocolProperties::Plain, EncodingProperties::Json(_)) => { 
        
               JsonParser::new(parser_config.specific, rw_columns, source_ctx).map(Self::Json) 
        
           }

It serves as a shortcut as it builds a json parser directly, which does not need to impl the access trait.

risingwave/src/connector/src/parser/json_parser.rs

Lines 112 to 144 in d616c1c

    
           pub async fn parse_inner( 
        
               &self, 
        
               mut payload: Vec<u8>, 
        
               mut writer: SourceStreamChunkRowWriter<'_>, 
        
           ) -> ConnectorResult<()> { 
        
               let value = simd_json::to_borrowed_value(&mut payload[self.payload_start_idx..]) 
        
                   .context("failed to parse json payload")?; 
        
               let values = if let simd_json::BorrowedValue::Array(arr) = value { 
        
                   Either::Left(arr.into_iter()) 
        
               } else { 
        
                   Either::Right(std::iter::once(value)) 
        
               }; 
        
               let mut errors = Vec::new(); 
        
               for value in values { 
        
                   let accessor = JsonAccess::new(value); 
        
                   match apply_row_accessor_on_stream_chunk_writer(accessor, &mut writer) { 
        
                       Ok(_) => {} 
        
                       Err(err) => errors.push(err), 
        
                   } 
        
               } 
        
               if errors.is_empty() { 
        
                   Ok(()) 
        
               } else { 
        
                   // TODO(error-handling): multiple errors 
        
                   bail!( 
        
                       "failed to parse {} row(s) in a single json message: {}", 
        
                       errors.len(), 
        
                       errors.iter().format(", ") 
        
                   ); 
        
               } 
        
           }

Other parsers, behave like the following

risingwave/src/connector/src/parser/mod.rs

Lines 775 to 779 in beabcb2

    
           (ProtocolProperties::Plain, _) => { 
        
               let parser = 
        
                   PlainParser::new(parser_config.specific, rw_columns, source_ctx).await?; 
        
               Ok(Self::Plain(parser)) 
        
           }

which requires concat a [&str] to find the right path to each field, leading to some serialization overhead

risingwave/src/connector/src/parser/unified/upsert.rs

Lines 105 to 124 in beabcb2

    
           fn access_field(&self, name: &str, type_expected: &DataType) -> super::AccessResult { 
        
               // access value firstly 
        
               match self.access(&["value", name], Some(type_expected)) { 
        
                   Err(AccessError::Undefined { .. }) => (), // fallthrough 
        
                   other => return other, 
        
               }; 
        
               match self.access(&["key", name], Some(type_expected)) { 
        
                   Err(AccessError::Undefined { .. }) => (), // fallthrough 
        
                   other => return other, 
        
               }; 
        
               if let Some(key_as_column_name) = &self.key_as_column_name 
        
                   && name == key_as_column_name 
        
               { 
        
                   return self.access(&["key"], Some(type_expected)); 
        
               } 
        
               Ok(None) 
        
           }

after #13707, I deprecate the dedicated plain json parser and unified all format plain parts, resulting in the pref issue.

fuyufjh · 2024-02-29T06:09:06Z

I see. Agree that migrating from JsonParser to PlainParser is necessary for better code structure.

I am investigating the performance gap between JsonParser (faster) and PlainParser (slower). Interestingly, both JsonParser and PlainParser calls JsonAccess once to parse the payload.

risingwave/src/connector/src/parser/json_parser.rs

Lines 127 to 128 in 6a14467

    
           let accessor = JsonAccess::new(value); 
        
           match apply_row_accessor_on_stream_chunk_writer(accessor, &mut writer) {

risingwave/src/connector/src/parser/plain_parser.rs

Line 117 in 6a14467

    
           row_op = row_op.with_value(self.payload_builder.generate_accessor(data).await?);

Thus, the only explanation is that PlainParser has additional overhead beyond parsing JSON. I think we need to conduct some micro-benchmarks and CPU profiling to investigate this issue.

lmatz · 2024-04-29T07:02:09Z

Please check the micro-benchmark results in #16526

and slack threads (because SVG will be translated into PNG by Github automatically):
https://risingwave-labs.slack.com/archives/C03CPDQCNE4/p1708405575586669

In short, we compare the latest implementation of json_parser and plain_parser (a wrapper of json_parser) to show that there is a ~10% performance gap.

If there are no changes to json_parser itself over this period of time, then the performance gap in parser can only explain half of the 20% performance drop in total.

lmatz · 2024-05-19T14:40:05Z

#16526 (comment)

Is it possible to refactor the code path into a chunk-based one?

We have a delicate vectorized execution engine to amortize constant overhead but the source connector/parser is not part of it.

lmatz · 2024-06-17T06:42:07Z

link #17196 and #12959

fuyufjh · 2024-07-10T08:41:02Z

link #17196 and #12959

Yes, subsequent investigation was handed over to @xxchan @BugenZhao. Any updates?

fuyufjh · 2024-08-29T08:11:07Z

Resolved by https://github.com/risingwavelabs/kube-bench/pull/453

lmatz added the type/perf label Jan 26, 2024

github-actions bot added this to the release-1.7 milestone Jan 26, 2024

lmatz added the help wanted Issues that need help from contributors label Feb 8, 2024

fuyufjh assigned tabVersion Feb 21, 2024

This was referenced Feb 21, 2024

perf: improve tpc-h q1 performance (single-topic) #15034

Open

perf: improve tpc-h q6 performance (single-topic) #15035

Open

Tracking: improve scaling up performance #14448

Open

Tracking: improve TPC-H performance #15036

Open

lmatz added the found-by-nexmark-perf-test label Mar 5, 2024

lmatz mentioned this issue Mar 6, 2024

perf: improve tpc-h q4 performance (single-topic) #14811

Closed

tabVersion modified the milestones: release-1.7, release-1.8 Mar 6, 2024

fuyufjh self-assigned this Mar 28, 2024

fuyufjh modified the milestones: release-1.8, future-release-1.9 Apr 8, 2024

lmatz mentioned this issue Apr 29, 2024

chore(bench): add plain_parser and json_parser comparison bench #16526

Merged

9 tasks

tabVersion modified the milestones: release-1.9, future-release-1.11 May 14, 2024

lmatz mentioned this issue May 28, 2024

perf: nexmark q0 #8712

Open

BugenZhao mentioned this issue May 30, 2024

refactor(connector): remove JsonParser from production code #17016

Merged

4 tasks

fuyufjh assigned BugenZhao and xxchan and unassigned fuyufjh and tabVersion Jul 10, 2024

fuyufjh mentioned this issue Aug 12, 2024

WIP test old rdkafka #17196

Closed

9 tasks

fuyufjh closed this as completed Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: throughput of stateless query with no computation drops 20% over time #14815

perf: throughput of stateless query with no computation drops 20% over time #14815

lmatz commented Jan 26, 2024

StrikeW commented Jan 26, 2024

lmatz commented Jan 26, 2024

lmatz commented Feb 3, 2024

xxchan commented Feb 6, 2024

lmatz commented Feb 7, 2024 •

edited

Loading

tabVersion commented Feb 20, 2024

lmatz commented Feb 21, 2024

lmatz commented Feb 21, 2024 •

edited

Loading

tabVersion commented Feb 28, 2024

fuyufjh commented Feb 29, 2024 •

edited

Loading

lmatz commented Apr 29, 2024 •

edited

Loading

lmatz commented May 19, 2024

lmatz commented Jun 17, 2024

fuyufjh commented Jul 10, 2024

fuyufjh commented Aug 29, 2024

perf: throughput of stateless query with no computation drops 20% over time #14815

perf: throughput of stateless query with no computation drops 20% over time #14815

Comments

lmatz commented Jan 26, 2024

StrikeW commented Jan 26, 2024

lmatz commented Jan 26, 2024

lmatz commented Feb 3, 2024

xxchan commented Feb 6, 2024

lmatz commented Feb 7, 2024 • edited Loading

tabVersion commented Feb 20, 2024

lmatz commented Feb 21, 2024

lmatz commented Feb 21, 2024 • edited Loading

tabVersion commented Feb 28, 2024

fuyufjh commented Feb 29, 2024 • edited Loading

lmatz commented Apr 29, 2024 • edited Loading

lmatz commented May 19, 2024

lmatz commented Jun 17, 2024

fuyufjh commented Jul 10, 2024

fuyufjh commented Aug 29, 2024

lmatz commented Feb 7, 2024 •

edited

Loading

lmatz commented Feb 21, 2024 •

edited

Loading

fuyufjh commented Feb 29, 2024 •

edited

Loading

lmatz commented Apr 29, 2024 •

edited

Loading