Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ROW_NUMBER() over #18775

Open
kennknowles opened this issue Jun 3, 2022 · 1 comment
Open

Support ROW_NUMBER() over #18775

kennknowles opened this issue Jun 3, 2022 · 1 comment

Comments

@kennknowles
Copy link
Member

Design and implement ROW_NUMBER() OVER window. It is supported by Calcite and we should look at feasibility of supporting it in Beam SQL

StackOverflow Post

 

Imported from Jira BEAM-4509. Original Jira may contain additional context.
Reported by: kedin.

@kennknowles
Copy link
Member Author

Just adding some commentary here to clarify the scope.

There are two parts:

  • Generic support for analytic functions (which SQL also calls "windowing" but it is not related to Beam windowing). I don't remember if we have that. In general it will be a stateful DoFn that is partitioned according to the PARTITION BY clause and is otherwise not generally parallelizable (the whole point of window functions is really that they are serial). We can probably reference other OSS projects for the general translation.
  • Specific support for ROW_NUMBER which is probably very easy. But it is probably not very useful unless we have the ability to sort.

Take this example:

SELECT
  ROW_NUMBER() OVER (ORDER BY lastname) + 1000 as participant_label,
  firstname,
  lastname,
  country
FROM athletes
WHERE sport = 'Marathon';

To evaluate this, you need to sort the whole thing by lastname then do a single-threaded stateful DoFn to track the row number. Usually you want to have a PARTITION BY so that it can be parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants