Skip to content

Commit

Permalink
Improve BatchElements documentation (apache#32082)
Browse files Browse the repository at this point in the history
* Imporve BatchElements documentation

* Add link to new documentation

* Update sdks/python/apache_beam/transforms/util.py

Co-authored-by: Jonathan Sabbagh <[email protected]>

* linting

* Apply suggestions from code review

Co-authored-by: tvalentyn <[email protected]>

* line-too-long

* Update sdks/python/apache_beam/transforms/util.py

---------

Co-authored-by: Jonathan Sabbagh <[email protected]>
Co-authored-by: tvalentyn <[email protected]>
  • Loading branch information
3 people authored Aug 30, 2024
1 parent 52ab49e commit cfe8fee
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions sdks/python/apache_beam/transforms/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -802,6 +802,20 @@ class BatchElements(PTransform):
corresponding to its contents. Each batch is emitted with a timestamp at
the end of their window.
When the max_batch_duration_secs arg is provided, a stateful implementation
of BatchElements is used to batch elements across bundles. This is most
impactful in streaming applications where many bundles only contain one
element. Larger max_batch_duration_secs values `might` reduce the throughput
of the transform, while smaller values might improve the throughput but
make it more likely that batches are smaller than the target batch size.
As a general recommendation, start with low values (e.g. 0.005 aka 5ms) and
increase as needed to get the desired tradeoff between target batch size
and latency or throughput.
For more information on tuning parameters to this transform, see
https://beam.apache.org/documentation/patterns/batch-elements
Args:
min_batch_size: (optional) the smallest size of a batch
max_batch_size: (optional) the largest size of a batch
Expand Down

0 comments on commit cfe8fee

Please sign in to comment.