-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(streaming): introduce streaming AsOf JOIN executor #18242
Conversation
proto/stream_plan.proto
Outdated
// Left deduped input pk indices. The pk of the left_table and | ||
// left_degree_table is [left_join_key | left_deduped_input_pk_indices] | ||
// and is expected to be the shortest key which starts with | ||
// the join key and satisfies unique constrain. | ||
repeated uint32 left_deduped_input_pk_indices = 9; | ||
// Right deduped input pk indices. The pk of the right_table and | ||
// right_degree_table is [right_join_key | right_deduped_input_pk_indices] | ||
// and is expected to be the shortest key which starts with | ||
// the join key and satisfies unique constrain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offline discussion: the pk of the state table will be join key | inequality key | stream key
and inequality key will be the first field in deduped_input_pk_indice
. The inequality key
will be used for further optimization. But Currently executor implementation will not handle it specially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inequality key will be the first field in deduped_input_pk_indice
This is not necessary.
use crate::executor::prelude::*; | ||
|
||
/// Evict the cache every n rows. | ||
const EVICT_EVERY_N_ROWS: u32 = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let the hash join and asof join refer to the same variable, or use different names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The should be different variables to control the different eviction frequencies. Consider they will only be used inside a file, using the same name is totally okay
|
||
impl<K: HashKey, S: StateStore, const T: AsOfJoinTypePrimitive> AsOfJoinExecutor<K, S, T> { | ||
#[allow(clippy::too_many_arguments)] | ||
pub fn new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a JoinBuilder
and implement JoinBuilder::build_asof_join
and build_hash_join
to extract some common logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of subtle differences between building asof join and hash join. Extracting common logic costs more than it earns.
side_r, | ||
asof_desc, | ||
actual_output_data_types, | ||
// inequality_watermarks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a issue for it?
} else { | ||
// Row which violates null-safe bitmap will never be matched so we need not | ||
// store. | ||
// Noop here because we only support left outer AsOf join. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we assert this somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the planner I'm working on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we still should add one in the executor part such as in from_proto
. To prevent confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will use JoinType
instead of AsOfJoinType in proto in planner PR, so I can do the check then
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Introduce streaming AsOf join executor. https://duckdb.org/docs/guides/sql_features/asof_join.html
When a record comes from the left side, it will look up the first matched record on the right side.
When a record comes from the right side, we first find out which previously matched record will be affected, and then send delete/insert those records, and send insert/delete the new matched result.
We use a secondary index in
JoinEntryState
to do the range scan.#17765
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.