Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024-02-07 nexmark performance degradation #15054

Closed
lmatz opened this issue Feb 8, 2024 · 10 comments
Closed

2024-02-07 nexmark performance degradation #15054

lmatz opened this issue Feb 8, 2024 · 10 comments
Assignees
Milestone

Comments

@lmatz lmatz added the type/perf label Feb 8, 2024
@github-actions github-actions bot added this to the release-1.7 milestone Feb 8, 2024
@lmatz lmatz added the type/bug Something isn't working label Feb 8, 2024
@st1page
Copy link
Contributor

st1page commented Feb 8, 2024

@st1page st1page self-assigned this Feb 8, 2024
@st1page
Copy link
Contributor

st1page commented Feb 8, 2024

Looks like the block cache's slight difference makes the join's performance worse(q4). the cache ops metrics in the picture is the table 1032's cache miss ops, which is the join's right table and the executor cache miss rate is 100% forever so the block cache miss can affect the performace significantly
image

@lmatz
Copy link
Contributor Author

lmatz commented Feb 8, 2024

It reminds me that the join actor match duration per second often increases a lot when scaling up #14448

@st1page
Copy link
Contributor

st1page commented Feb 9, 2024

looks today(20240208) result comes back... but I do not know why https://github.com/risingwavelabs/rw-commits-history?tab=readme-ov-file#nightly-20240208

@lmatz
Copy link
Contributor Author

lmatz commented Feb 9, 2024

Is it possible that the compaction is not deterministic and affects the block miss rate?

@lmatz
Copy link
Contributor Author

lmatz commented Feb 23, 2024

can it be the same problem as in #15169, aka the failure task in compaction? @st1page

@st1page
Copy link
Contributor

st1page commented Feb 23, 2024

Looks like the block cache's slight difference makes the join's performance worse(q4). the cache ops metrics in the picture is the table 1032's cache miss ops, which is the join's right table and the executor cache miss rate is 100% forever so the block cache miss can affect the performace significantly

I checked q4 and q5 quickly and did not find the failure task in compaction in metrics and there is no significant difference of the SST cound panel. So I think not

@st1page
Copy link
Contributor

st1page commented Feb 23, 2024

Because we have not seen this degradation anymore, can we close the issue?

@lmatz lmatz closed this as completed Feb 23, 2024
@lmatz
Copy link
Contributor Author

lmatz commented Feb 23, 2024

I close it now, but the reason of fluctuation is unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants