You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am unsure if this is the correct place to raise this issue. I am acting on the understanding that the Dataflow Runner source code is located in this repository. However, the problem might be more closely related to the behavior of the Google Cloud Dataflow runtime. If that's the case, I would appreciate it if someone could assist me in addressing this issue directly with Google Cloud.
I have an unbounded pipeline (Python SDK) that consumes data from Google Cloud Pub/Sub subscriptions and is deployed to Google Cloud Dataflow. This pipeline has three branches, which are eventually merged using the Flatten() function.
As illustrated in the image below, the pipeline has started successfully and is processing elements. However, two of the branches are actually empty and therefore are not providing any input to the Flatten/merge step. This behavior is intentional.
With the current configuration, the Flatten/Merge step is not yielding any output, despite one of the branches continuously providing input to it. The details of the output collection can be observed in the image below:
I would expect the Flatten/Merge step to yield at least elements of the branch Map BE events to Commands
I've also tried configuring windowing in each branch just before the Flatten/Merge step, setting it to fixed windows of 60 seconds with the following code. However, this didn't alter the behavior at all. I thought the main issue was the empty PCollection branches and that windowing would compel Flatten() to merge windows from each branch.
T=TypeVar('T')
defwith_ts(value: T) ->TimestampedValue[T]:
returnTimestampedValue(value, int(time.time()))
.....
|"Map WEB events to commands">>beam.Map(lambdax: with_ts(web_event_to_commands(x)))
|"Window WEB commands">>beam.WindowInto(FixedWindows(60)))
When running the very same pipeline with direct runner on localhost, the Flatten/Merge step is behaving as expected. So the behaviour of Direct runner vs Dataflow runner is not consistent in regards to Flatten() transform.
So it seems this is only visual bug int the Google Cloud UI. Flatten/merge step is working correctly. The issue was that beam.LogElements() is not displaying any logs in the Dataflow for me for whatever reason (maybe misconfiguration of levels) and at the same time PCollection() details shown in the screenshots I've provided are displaying wrong metadata for Flatten/Merge step and each downstream steps from it.
I've switched the last step in my pipeline for different output collection and can successfully verify that processing of the elements works as intended.
What happened?
I am unsure if this is the correct place to raise this issue. I am acting on the understanding that the Dataflow Runner source code is located in this repository. However, the problem might be more closely related to the behavior of the Google Cloud Dataflow runtime. If that's the case, I would appreciate it if someone could assist me in addressing this issue directly with Google Cloud.
I have an unbounded pipeline (Python SDK) that consumes data from Google Cloud Pub/Sub subscriptions and is deployed to Google Cloud Dataflow. This pipeline has three branches, which are eventually merged using the Flatten() function.
As illustrated in the image below, the pipeline has started successfully and is processing elements. However, two of the branches are actually empty and therefore are not providing any input to the Flatten/merge step. This behavior is intentional.
With the current configuration, the Flatten/Merge step is not yielding any output, despite one of the branches continuously providing input to it. The details of the output collection can be observed in the image below:
I would expect the Flatten/Merge step to yield at least elements of the branch
Map BE events to Commands
I've also tried configuring windowing in each branch just before the Flatten/Merge step, setting it to fixed windows of 60 seconds with the following code. However, this didn't alter the behavior at all. I thought the main issue was the empty PCollection branches and that windowing would compel Flatten() to merge windows from each branch.
I believe that according to the official documentations of
Flatten()
:https://beam.apache.org/documentation/transforms/python/other/flatten/
https://beam.apache.org/documentation/programming-guide/#flatten
the current behaviour is not expected and Flatten() should yield output elements from each input
PCollection
independently of other inputPCollection's
and windowing strategy (if such strategy is same in each inputPCollection
).Thanks in advance for any help in case this is not a bug but a mistake on my side.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: