Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: Nexmark queries optimization #7289

Open
37 of 54 tasks
lmatz opened this issue Jan 10, 2023 · 2 comments
Open
37 of 54 tasks

Tracking: Nexmark queries optimization #7289

lmatz opened this issue Jan 10, 2023 · 2 comments
Labels
help wanted Issues that need help from contributors priority/high type/perf type/tracking Tracking issue.

Comments

@lmatz
Copy link
Contributor

lmatz commented Jan 10, 2023

Validation:

  1. Compare the executable plan of RW and other systems. Make sure we are running the same/similar plan(unless it is some non-trivial plan-level optimization).
  2. Make sure when benchmarking, the test is executed with a unified nexmark source instead of 3 separate sources, details in feat(nexmark source): generate events in a single source #6747. In short, otherwise join will miss much more than normal and it leads to an ill-structured workload.
  3. Also make sure that the query uses create sink XXX with ( connector = 'blackhole' ) instead of create materialized view as the other system doesn't have a materialized view. Although some plans shown in the sub-issues are still in the form of materialized view.

Optimization Tasks

Watermark

We notice that at https://github.com/nexmark/nexmark/blob/master/nexmark-flink/src/main/resources/queries/ddl_gen.sql#L37, Nexmark's source table has specified watermark as FOR dateTime AS dateTime - INTERVAL '4' SECOND. As watermark helps clean the state table and thus potentially improve the performance of state access, we also track the progress of watermark here:

The queries

Queries made up by @BugenZhao to cover stream operators that are not covered by the standard Nexmark:

@lmatz lmatz added this to the next-release-0.1.17 milestone Jan 10, 2023
@fuyufjh fuyufjh pinned this issue Jan 10, 2023
@fuyufjh fuyufjh removed this from the release-0.1.16 milestone Jan 30, 2023
@lmatz
Copy link
Contributor Author

lmatz commented Mar 13, 2023

q15, q16, q17 have similar query patterns,
and they may suffer from the same problem.
SCR-20230313-n1d

@lmatz
Copy link
Contributor Author

lmatz commented Apr 4, 2023

Q6 requires AVG over. Flink would output error when running this query.
Q11 requires session_start.
Q12 requires proc_time.
Q13 requires proc_time.
Q14 requires count_char UDF.

By 4/4, Q19 is supported but not enabled on the performance dashboard.

@lmatz lmatz added the help wanted Issues that need help from contributors label Apr 20, 2023
@lmatz lmatz unpinned this issue May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues that need help from contributors priority/high type/perf type/tracking Tracking issue.
Projects
None yet
Development

No branches or pull requests

2 participants