Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Compute output of LookupJoinExec dynamically #117763

Merged

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Nov 29, 2024

LookupJoinExec should not assume its output but instead compute it from

  • Its input fields from the left
  • The fields added from the lookup index

Currently, LookupJoinExec's output is determined when the logical plan is mapped to a physical one, and thereafter the output cannot be changed anymore.
This makes it impossible to have late materialization of fields from the left hand side via field extractions, because we are forced to extract all fields before the LookupJoinExec, otherwise we do not achieve the prescribed output.

Avoid that by tracking only which fields the LookupJoinExec will add from the lookup index instead of tracking the whole output (that was only correct for the logical plan).

Note: While this PR is a refactoring for the current functionality, it should unblock @craigtaverner 's ongoing work related to field extractions and getting multiple LOOKUP JOIN queries to work correctly without adding hacks.

LookupJoinExec should not assume its output but instead compute it from
- Its input fields from the left
- The fields added from the lookup index

Currently, LookupJoinExec's output is determined when the logical plan
is mapped to a physical one, and thereafter the output cannot be changed
anymore.
This makes it impossible to have late materialization of fields from the
left hand side via field extractions, because we are forced to extract
*all* fields before the LookupJoinExec, otherwise we do not achieve the
prescribed output.

Avoid that by tracking only which fields the LookupJoinExec will
add from the lookup index instead of tracking the whole output (that was
only correct for the logical plan).
@@ -62,34 +60,4 @@ public final PhysicalPlan apply(PhysicalPlan plan) {

protected abstract PhysicalPlan rule(SubPlan plan);
}

public abstract static class OptimizerExpressionRule<E extends Expression> extends Rule<PhysicalPlan, PhysicalPlan> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but I noticed this became unused.

@elasticsearchmachine elasticsearchmachine added v9.0.0 needs:triage Requires assignment of a team area label labels Nov 29, 2024
Comment on lines -118 to -122
} else if (RIGHT.equals(joinType)) {
List<Attribute> leftOutputWithoutMatchFields = leftOutput.stream()
.filter(attr -> matchFieldNames.contains(attr.name()) == false)
.toList();
output = mergeOutputAttributes(leftOutputWithoutMatchFields, rightOutput);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused, and it's a hassle to keep this in sync with the LEFT case.

Comment on lines +586 to +587
List<Layout.ChannelAndType> matchFields = new ArrayList<>(join.leftFields().size());
for (Attribute m : join.leftFields()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, left fields and match fields are the same. In fact, the JoinConfig's matchFields is currently fully redundant.

@alex-spies alex-spies added >non-issue auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.18.0 and removed needs:triage Requires assignment of a team area label labels Nov 29, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 29, 2024
Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this code also clatifies a bit the difference between matchField and leftField, and I like that matchFields is removed from the LookupJoinExec

this.leftFields = in.readNamedWriteableCollectionAsList(Attribute.class);
this.rightFields = in.readNamedWriteableCollectionAsList(Attribute.class);
this.output = in.readNamedWriteableCollectionAsList(Attribute.class);
this.addedFields = in.readNamedWriteableCollectionAsList(Attribute.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume we're not bothering about transport version because this is a snapshot only feature right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid breaking bwc with a new transport version, but the previous version of LOOKUP JOIN isn't exactly fully working either, so I don't think we need to bother.

@alex-spies alex-spies added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 29, 2024
@elasticsearchmachine elasticsearchmachine merged commit 64107e0 into elastic:main Nov 29, 2024
16 checks passed
@alex-spies alex-spies deleted the fix-lookupjoinexec-output branch November 29, 2024 16:34
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 117763

@alex-spies
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Nov 29, 2024
LookupJoinExec should not assume its output but instead compute it from
- Its input fields from the left - The fields added from the lookup
index

Currently, LookupJoinExec's output is determined when the logical plan
is mapped to a physical one, and thereafter the output cannot be changed
anymore. This makes it impossible to have late materialization of fields
from the left hand side via field extractions, because we are forced to
extract *all* fields before the LookupJoinExec, otherwise we do not
achieve the prescribed output.

Avoid that by tracking only which fields the LookupJoinExec will add
from the lookup index instead of tracking the whole output (that was
only correct for the logical plan).

**Note:** While this PR is a refactoring for the current functionality,
it should unblock @craigtaverner 's ongoing work related to field
extractions and getting multiple LOOKUP JOIN queries to work correctly
without adding hacks.

(cherry picked from commit 64107e0)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
elasticsearchmachine pushed a commit that referenced this pull request Dec 2, 2024
LookupJoinExec should not assume its output but instead compute it from
- Its input fields from the left - The fields added from the lookup
index

Currently, LookupJoinExec's output is determined when the logical plan
is mapped to a physical one, and thereafter the output cannot be changed
anymore. This makes it impossible to have late materialization of fields
from the left hand side via field extractions, because we are forced to
extract *all* fields before the LookupJoinExec, otherwise we do not
achieve the prescribed output.

Avoid that by tracking only which fields the LookupJoinExec will add
from the lookup index instead of tracking the whole output (that was
only correct for the logical plan).

**Note:** While this PR is a refactoring for the current functionality,
it should unblock @craigtaverner 's ongoing work related to field
extractions and getting multiple LOOKUP JOIN queries to work correctly
without adding hacks.

(cherry picked from commit 64107e0)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants