You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just adding some commentary here to clarify the scope.
There are two parts:
Generic support for analytic functions (which SQL also calls "windowing" but it is not related to Beam windowing). I don't remember if we have that. In general it will be a stateful DoFn that is partitioned according to the PARTITION BY clause and is otherwise not generally parallelizable (the whole point of window functions is really that they are serial). We can probably reference other OSS projects for the general translation.
Specific support for ROW_NUMBER which is probably very easy. But it is probably not very useful unless we have the ability to sort.
Take this example:
SELECT
ROW_NUMBER() OVER (ORDER BY lastname) +1000as participant_label,
firstname,
lastname,
country
FROM athletes
WHERE sport ='Marathon';
To evaluate this, you need to sort the whole thing by lastname then do a single-threaded stateful DoFn to track the row number. Usually you want to have a PARTITION BY so that it can be parallel.
Design and implement ROW_NUMBER() OVER window. It is supported by Calcite and we should look at feasibility of supporting it in Beam SQL
StackOverflow Post
Imported from Jira BEAM-4509. Original Jira may contain additional context.
Reported by: kedin.
The text was updated successfully, but these errors were encountered: