-
Notifications
You must be signed in to change notification settings - Fork 28.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-44748][SQL] Query execution for the PARTITION BY clause in UDT…
…F TABLE arguments ### What changes were proposed in this pull request? This PR implements query execution support for for the PARTITION BY and ORDER BY clauses for UDTF TABLE arguments. * The query planning support was added in [1] and [2] and [3]. After those changes, the planner added a projection to compute the PARTITION BY expressions, plus a repartition operator, plus a sort operator. * In this PR, the Python executor receives the indexes of these expressions within the input table's rows, and compares the values of the projected partitioning expressions between consecutive rows. * When the values change, this marks the boundary between partitions, and so we call the UDTF instance's `terminate` method, then destroy it and create a new one for the next partition. [1] #42100 [2] #42174 [3] #42351 Example: ``` # Make a test UDTF to yield an output row with the same value # consumed from the last input row in the input table or partition. class TestUDTF: def eval(self, row: Row): self._last = row['input'] self._partition_col = row['partition_col'] def terminate(self): yield self._partition_col, self._last func = udtf(TestUDTF, returnType='partition_col: int, last: int') self.spark.udtf.register('test_udtf', func) self.spark.sql(''' WITH t AS ( SELECT id AS partition_col, 1 AS input FROM range(0, 2) UNION ALL SELECT id AS partition_col, 2 AS input FROM range(0, 2) ) SELECT * FROM test_udtf(TABLE(t) PARTITION BY partition_col ORDER BY input) ''').collect() > [Row(partition_col=0, last=2), (partition_col=1, last=2)] ``` ### Why are the changes needed? This brings full end-to-end execution for the PARTITION BY and/or ORDER BY clauses for UDTF TABLE arguments. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds end-to-end testing in `test_udtf.py`. Closes #42420 from dtenedor/inspect-partition-by. Authored-by: Daniel Tenedorio <[email protected]> Signed-off-by: Takuya UESHIN <[email protected]>
- Loading branch information
Showing
7 changed files
with
313 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.