Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(PG CDC): investigate the CDC backfill performance #15173

Closed
st1page opened this issue Feb 21, 2024 · 2 comments
Closed

perf(PG CDC): investigate the CDC backfill performance #15173

st1page opened this issue Feb 21, 2024 · 2 comments
Assignees
Milestone

Comments

@st1page
Copy link
Contributor

st1page commented Feb 21, 2024

When investigating an issue with a user, we found RW might need more than 10s to scan a chunk from the Upstream PG. It could be because of many potential reasons such as:

  • the table is too large with 180 million rows
  • their upstream PG is an AWS Aurora with different isolation implementation
  • It is also related to the workload on the upstream PG.

We need to investigate this case and set some benchmarks for CDC backfill in different situations.

Maybe also we need to doc what our CDC backfill does:

  • what SQL is
  • which isolation level those SQL in
  • in which upstream DB's workload, the backfill could be in bad performance

So that users can know more clearly about its performance.

@github-actions github-actions bot added this to the release-1.7 milestone Feb 21, 2024
@StrikeW
Copy link
Contributor

StrikeW commented Mar 5, 2024

In current stage, I would not attribute the performance degradation to specific vendor, e.g AWS. The scenario is a new case we didn't cover in self-test and the automation pipeline: during the progress of backfilling historical data, upstream DB also has continuous insertion/update workload. I think we should also consider adding this case as a performance scenario. cc @cyliu0 @lmatz
https://risingwave-labs.slack.com/archives/C064SBT0ASF/p1709632731824109

@StrikeW
Copy link
Contributor

StrikeW commented May 8, 2024

We have implemented an optimization for cdc backfill #16349 and add LIMIT to the select query #15684, I think we can close this issue and reopen it in future if needed.

@StrikeW StrikeW closed this as completed May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants