Tracking: Nexmark queries optimization #7289
Labels
help wanted
Issues that need help from contributors
priority/high
type/perf
type/tracking
Tracking issue.
Validation:
create sink XXX with ( connector = 'blackhole' )
instead ofcreate materialized view
as the other system doesn't have a materialized view. Although some plans shown in the sub-issues are still in the form ofmaterialized view
.Optimization Tasks
exchange
beforeblackhole
sink #7377to_char
#7924may_exist
in Hash Join #7938read_prefix_len_hint
in distinct agg table #8541SerializedKeySerializer::append
#8683BitmapIter
#8848compact
in aggregation executor #9150DataChunkBuilder
#9301Watermark
We notice that at https://github.com/nexmark/nexmark/blob/master/nexmark-flink/src/main/resources/queries/ddl_gen.sql#L37, Nexmark's source table has specified watermark as
FOR dateTime AS dateTime - INTERVAL '4' SECOND
. As watermark helps clean the state table and thus potentially improve the performance of state access, we also track the progress of watermark here:The queries
blackhole
sink.Group By
. After this, all aggregation withGroup By
(non-simple) will no longer choose 2-phase aggregation.Queries made up by @BugenZhao to cover stream operators that are not covered by the standard Nexmark:
The text was updated successfully, but these errors were encountered: