-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: [Go SDK] periodic.impulse increases Dataflow Backlog time #27707
Comments
just note Java PeriodicImpulse having the same issue. Its more likely a Dataflow issue of metrics |
@Abacn Then, is there a way to implement the slowly updating side input pattern with the Go SDK? We have come up with some workaround ideas:
However, we would like to avoid complex implementations. |
If I understood correctly this is just an artifact of Dataflow UI. Does it affect the functionality of PeriodicImpulse? |
Same issue here with Python but without having explicitly specified an End Time. Once Dataflow scales up, it will not scale down. As you can see, it has many many weeks in backlog even though Data freshness is already 0 because everything is already being processed at a good pace after catching up with the real backlog |
I believe this has to do with the fact that PeriodicImpulse uses EDITED (Answer in next comment): |
@kemkemG0 In case you did not manage to solve the issue, the whole problem is in the class In can be solved "easily" by replacing this method in a subclass for:
And then subclass |
closed by #32506 |
What happened?
We have a slowly updating object that we'd like to use as a SideInput.
We've decided to utilize periodic.Impulse to periodically update the SideInput. We set the startTime to be "now" and the endTime to be "10 years later", with the intention of producing output endlessly.
However, we observed an extreme increase in the Backlog time, ballooning to around 600 weeks. We suspect that this is because we've set the endTime to be 10 years later.
Image below, Reaching to around 500 weeks = 10 years as I set as endTime.
It appears that Dataflow estimates the end of the job as "10 years later = 500 weeks", causing horizontal scaling of workers and resulting in the use of 100 workers.
Below is our simplified sample code, which closely follows this reference.
[Refrence] https://github.com/apache/beam/blob/dc1cfe54bfa0d3a22034f3fea463f0284cb2ba83/sdks/go/examples/slowly_updating_side_input/slowly_updating_side_input.go
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: