Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Python KafkaIO read transform is inefficient when using the commit_offsets_in_finalize option #27061

Closed
15 tasks
chamikaramj opened this issue Jun 8, 2023 · 4 comments

Comments

@chamikaramj
Copy link
Contributor

What happened?

Seems like Python Kafka IO read transform requires significant more resources when using"commit_offsets_in_finalize" option.

'commit_offsets_in_finalize' adds an extra Reshuffle and a callback to Kafka to commit messages.

For streaming jobs, seems like this results in backlog of the Kafka read work item increasing by a large amount time to time and Dataflow jobs are unable to downscale due to this backlog.

Transform can operate efficiently when not using the "commit_offsets_in_finalize" option.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@chamikaramj
Copy link
Contributor Author

Java pipelines may exhibit similar behaviors but I haven't tested that.

@johnjcasey is this something we can track as a part of the ongoing effort to improve the Kafka connector ?

@johnjcasey
Copy link
Contributor

For sure. The reshuffle is essentially mandatory, but it is a side path, so it shouldn't hold up the main flow of the data

@Abacn
Copy link
Contributor

Abacn commented Jun 8, 2023

related Java issue (commitOffsetsInFinalize): #20689

@scwhittle
Copy link
Contributor

#31682 removes the reshuffle when commit offsets in finalized is enabled

@github-actions github-actions bot added this to the 2.60.0 Release milestone Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants