-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Compute output of LookupJoinExec dynamically #117763
ESQL: Compute output of LookupJoinExec dynamically #117763
Conversation
LookupJoinExec should not assume its output but instead compute it from - Its input fields from the left - The fields added from the lookup index Currently, LookupJoinExec's output is determined when the logical plan is mapped to a physical one, and thereafter the output cannot be changed anymore. This makes it impossible to have late materialization of fields from the left hand side via field extractions, because we are forced to extract *all* fields before the LookupJoinExec, otherwise we do not achieve the prescribed output. Avoid that by tracking only which fields the LookupJoinExec will add from the lookup index instead of tracking the whole output (that was only correct for the logical plan).
@@ -62,34 +60,4 @@ public final PhysicalPlan apply(PhysicalPlan plan) { | |||
|
|||
protected abstract PhysicalPlan rule(SubPlan plan); | |||
} | |||
|
|||
public abstract static class OptimizerExpressionRule<E extends Expression> extends Rule<PhysicalPlan, PhysicalPlan> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated, but I noticed this became unused.
} else if (RIGHT.equals(joinType)) { | ||
List<Attribute> leftOutputWithoutMatchFields = leftOutput.stream() | ||
.filter(attr -> matchFieldNames.contains(attr.name()) == false) | ||
.toList(); | ||
output = mergeOutputAttributes(leftOutputWithoutMatchFields, rightOutput); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused, and it's a hassle to keep this in sync with the LEFT
case.
List<Layout.ChannelAndType> matchFields = new ArrayList<>(join.leftFields().size()); | ||
for (Attribute m : join.leftFields()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, left fields and match fields are the same. In fact, the JoinConfig's matchFields is currently fully redundant.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this code also clatifies a bit the difference between matchField and leftField, and I like that matchFields is removed from the LookupJoinExec
this.leftFields = in.readNamedWriteableCollectionAsList(Attribute.class); | ||
this.rightFields = in.readNamedWriteableCollectionAsList(Attribute.class); | ||
this.output = in.readNamedWriteableCollectionAsList(Attribute.class); | ||
this.addedFields = in.readNamedWriteableCollectionAsList(Attribute.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume we're not bothering about transport version because this is a snapshot only feature right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid breaking bwc with a new transport version, but the previous version of LOOKUP JOIN isn't exactly fully working either, so I don't think we need to bother.
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
LookupJoinExec should not assume its output but instead compute it from - Its input fields from the left - The fields added from the lookup index Currently, LookupJoinExec's output is determined when the logical plan is mapped to a physical one, and thereafter the output cannot be changed anymore. This makes it impossible to have late materialization of fields from the left hand side via field extractions, because we are forced to extract *all* fields before the LookupJoinExec, otherwise we do not achieve the prescribed output. Avoid that by tracking only which fields the LookupJoinExec will add from the lookup index instead of tracking the whole output (that was only correct for the logical plan). **Note:** While this PR is a refactoring for the current functionality, it should unblock @craigtaverner 's ongoing work related to field extractions and getting multiple LOOKUP JOIN queries to work correctly without adding hacks. (cherry picked from commit 64107e0) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
LookupJoinExec should not assume its output but instead compute it from - Its input fields from the left - The fields added from the lookup index Currently, LookupJoinExec's output is determined when the logical plan is mapped to a physical one, and thereafter the output cannot be changed anymore. This makes it impossible to have late materialization of fields from the left hand side via field extractions, because we are forced to extract *all* fields before the LookupJoinExec, otherwise we do not achieve the prescribed output. Avoid that by tracking only which fields the LookupJoinExec will add from the lookup index instead of tracking the whole output (that was only correct for the logical plan). **Note:** While this PR is a refactoring for the current functionality, it should unblock @craigtaverner 's ongoing work related to field extractions and getting multiple LOOKUP JOIN queries to work correctly without adding hacks. (cherry picked from commit 64107e0) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
LookupJoinExec should not assume its output but instead compute it from
Currently, LookupJoinExec's output is determined when the logical plan is mapped to a physical one, and thereafter the output cannot be changed anymore.
This makes it impossible to have late materialization of fields from the left hand side via field extractions, because we are forced to extract all fields before the LookupJoinExec, otherwise we do not achieve the prescribed output.
Avoid that by tracking only which fields the LookupJoinExec will add from the lookup index instead of tracking the whole output (that was only correct for the logical plan).
Note: While this PR is a refactoring for the current functionality, it should unblock @craigtaverner 's ongoing work related to field extractions and getting multiple LOOKUP JOIN queries to work correctly without adding hacks.