Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nightly-20240220 nexmark perf degradation #15169

Closed
cyliu0 opened this issue Feb 21, 2024 · 12 comments
Closed

nightly-20240220 nexmark perf degradation #15169

cyliu0 opened this issue Feb 21, 2024 · 12 comments

Comments

@cyliu0
Copy link
Collaborator

cyliu0 commented Feb 21, 2024

Describe the bug

+---------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+---------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q17-blackhole-medium-1cn                              |        21038 | Negative   | -28.64%             | -14.31%                     |
| nexmark-q5-rewrite-blackhole-medium-1cn                       |        21047 | Negative   | -31.09%             | -19.14%                     |
| nexmark-q19-blackhole-medium-1cn                              |        21053 | Negative   | -23.29%             | -11.83%                     |
| nexmark-q105-blackhole-medium-1cn                             |        21088 | Negative   | -30.68%             | -12.06%

http://metabase.risingwave-cloud.xyz/dashboard/241-nexmark-blackhole-1cn-anti-affinity-rw-avg-source-throughput?start_date=2024-02-01&namespace=daily

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

nightly-20240220

@cyliu0 cyliu0 added type/bug Something isn't working type/perf labels Feb 21, 2024
@github-actions github-actions bot added this to the release-1.7 milestone Feb 21, 2024
@lmatz
Copy link
Contributor

lmatz commented Feb 21, 2024

I suppose #14409 OpenDAL is not enabled in the daily test yet, see: https://github.com/risingwavelabs/kube-bench/pull/368

Then the only PR (suppose the regression is indeed caused by kernel changes) can only be #15133. Do you think it is possible to lead to such a big regression? @zwang28

The other two, #14946 affects the CDC source only, and #15143 is purely renaming.

might as well check changes in the testing environment

@st1page
Copy link
Contributor

st1page commented Feb 21, 2024

The queries all are CPU intensive(more than 750/800 CPU usage) and I do not find any IO performance difference. We can generate the CPU flame graph later if we can not find the issue on the PR side.

@zwang28
Copy link
Contributor

zwang28 commented Feb 21, 2024

Then the only PR (suppose the regression is indeed caused by kernel changes) can only be #15133. Do you think it is possible to lead to such a big regression?

No. This PR only affect openDAL.

@st1page
Copy link
Contributor

st1page commented Feb 21, 2024

@st1page
Copy link
Contributor

st1page commented Feb 21, 2024

check if it can be reproduced

BENCH_NAMESPACE="sts0220"
BENCH_TESTBED="medium-1cn"
CI="true"
ENABLE_BLACKHOLE="true"
LABELS="medium-1cn-anti-affinity"
NEXMARK_QUERY="q5-rewrite,q17,q19,q105"
RW_VERSION="nightly-20240220"

https://buildkite.com/risingwave-test/nexmark-benchmark/builds/3106

@st1page
Copy link
Contributor

st1page commented Feb 21, 2024

check if it can be reproduced

BENCH_NAMESPACE="sts0220"
BENCH_TESTBED="medium-1cn"
CI="true"
ENABLE_BLACKHOLE="true"
LABELS="medium-1cn-anti-affinity"
NEXMARK_QUERY="q5-rewrite,q17,q19,q105"
RW_VERSION="nightly-20240220"

https://buildkite.com/risingwave-test/nexmark-benchmark/builds/3106

The reruned result comes back 😇

@lmatz
Copy link
Contributor

lmatz commented Feb 22, 2024

@Li0k
Copy link
Contributor

Li0k commented Feb 22, 2024

q105

image

Base level compaction always skip.

image

I found that a task hangs for a long time without being canceled, which is why the task can't be picked.

image

fix pr : #15194

@lmatz
Copy link
Contributor

lmatz commented Feb 23, 2024

I suppose #15194 fixed the problem in q105, but not the compaction task failure shown in q17 and q19, right? @Li0k

if so, I will keep the issue open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants