You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO: Template successfully created.
Exception in thread "main" java.lang.UnsupportedOperationException:
The result of template creation should not be used.
at org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getJobId(DataflowTemplateJob.java:37)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:524)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:506)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:295)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:224)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:183)
at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:176)
This is a real error. If a template was created, the job is complete. Instead of crashing by tried to access the job id, as though DataflowPipelineJob doesn't know it made a template, it should instead return successfully. Or perhaps there is another design choice. But just crashes does not make sense. Probably DataflowRunner should not return a DataflowPipelineJob at all in this way.
Imported from Jira BEAM-9337. Original Jira may contain additional context.
Reported by: kenn.
The text was updated successfully, but these errors were encountered:
We recently ran into this error when trying to do some after batch cleanup in some of our GCP Dataflow jobs when using a template, and had to revert those changes. Not being able to use waitUntilFinish() to defer cleanup of some of the intermediate resources and do some post pipeline work makes it a bit more difficult to manage these pipelines.
Running a pipeline, then running another pipeline or even just some bit of non-pipeline code, when the first is finished, is not possible with this behavior.
Why is this important / valid use case:
Many pipelines end in writing some output somewhere as a terminal state, so you cannot chain another step past it. However, if you want to do something, such as write a marker file, after all output has completed, you must do it after the pipeline ends.
You can only do this if you waitUntilFinish().
If you collect metrics from the run to store them or process them in some way, you probably want to:
var result = pipeline.run();
result.waitUntilFinish();
processMetrics(result.metrics());
This is a real error. If a template was created, the job is complete. Instead of crashing by tried to access the job id, as though
DataflowPipelineJob
doesn't know it made a template, it should instead return successfully. Or perhaps there is another design choice. But just crashes does not make sense. ProbablyDataflowRunner
should not return aDataflowPipelineJob
at all in this way.Imported from Jira BEAM-9337. Original Jira may contain additional context.
Reported by: kenn.
The text was updated successfully, but these errors were encountered: