Beam 2.38.0 release
We are happy to present the new 2.38.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.38.0 check out the detailed release notes.
I/Os
- Introduce projection pushdown optimizer to the Java SDK (BEAM-12976). The optimizer currently only works on the BigQuery Storage API, but more I/Os will be added in future releases. If you encounter a bug with the optimizer, please file a JIRA and disable the optimizer using pipeline option
--experiments=disable_projection_pushdown
. - A new IO for Neo4j graph databases was added. (BEAM-1857) It has the ability to update nodes and relationships using UNWIND statements and to read data using cypher statements with parameters.
amazon-web-services2
has reached feature parity and is finally recommended over the earlieramazon-web-services
andkinesis
modules (Java). These will be deprecated in one of the next releases (BEAM-13174).- Long outstanding write support for
Kinesis
was added (BEAM-13175). - Configuration was simplified and made consistent across all IOs, including the usage of
AwsOptions
(BEAM-13563, BEAM-13663, BEAM-13587). - Additionally, there's a long list of recent improvements and fixes to
S3
Filesystem (BEAM-13245, BEAM-13246, BEAM-13441, BEAM-13445, BEAM-14011),
DynamoDB
IO (BEAM-13209, BEAM-13209),
SQS
IO (BEAM-13631, BEAM-13510) and others.
- Long outstanding write support for
New Features / Improvements
- Pipeline dependencies supplied through
--requirements_file
will now be staged to the runner using binary distributions (wheels) of the PyPI packages for linux_x86_64 platform (BEAM-4032). To restore the behavior to use source distributions, set pipeline option--requirements_cache_only_sources
. To skip staging the packages at submission time, set pipeline option--requirements_cache=skip
(Python). - The Flink runner now supports Flink 1.14.x (BEAM-13106).
- Interactive Beam now supports remotely executing Flink pipelines on Dataproc (Python) (BEAM-14071).
Breaking Changes
- (Python) Previously
DoFn.infer_output_types
was expected to returnIterable[element_type]
whereelement_type
is the PCollection elemnt type. It is now expected to returnelement_type
. Take care if you have overrideninfer_output_type
in aDoFn
(this is not common). See BEAM-13860. - (
amazon-web-services2
) The types ofawsRegion
/endpoint
inAwsOptions
changed from String toRegion
/URI
(BEAM-13563).
Deprecations
- Beam 2.38.0 will be the last minor release to support Flink 1.11.
- (
amazon-web-services2
) Client providers (withXYZClientProvider()
) as well as IO specificRetryConfiguration
s are deprecated, instead usewithClientConfiguration()
orAwsOptions
to configure AWS IOs / clients.
Custom implementations of client providers shall be replaced with a respectiveClientBuilderFactory
and configured throughAwsOptions
(BEAM-13563).
Bugfixes
- Fix S3 copy for large objects (Java) (BEAM-14011)
- Fix quadratic behavior of pipeline canonicalization (Go) (BEAM-14128)
- This caused unnecessarily long pre-processing times before job submission for large complex pipelines.
- Fix
pyarrow
version parsing (Python)(BEAM-14235)
Known Issues
- See a full list of open issues that affect this version.
List of Contributors
According to git shortlog, the following people contributed to the 2.38.0 release. Thank you to all contributors!
abhijeet-lele
Ahmet Altay
akustov
Alexander
Alexander Zhuravlev
Alexey Romanenko
AlikRodriguez
Anand Inguva
andoni-guzman
andreukus
Andy Ye
Ankur Goenka
ansh0l
Artur Khanin
Aydar Farrakhov
Aydar Zainutdinov
Benjamin Gonzalez
Brian Hulette
brucearctor
bulat safiullin
bullet03
Carl Mastrangelo
Chamikara Jayalath
Chun Yang
Daniela Martín
Daniel Oliveira
Danny McCormick
daria.malkova
David Cavazos
David Huntsperger
dmitryor
Dmytro Sadovnychyi
dpcollins-google
egalpin
Elias Segundo Antonio
emily
Etienne Chauchot
Hengfeng Li
Ismaël Mejía
Israel Herraiz
Jack McCluskey
Jakub Kukul
Janek Bevendorff
Jeff Klukas
Johan Sternby
Kamil Breguła
Kenneth Knowles
Ke Wu
Kiley
Kyle Weaver
laraschmidt
Lara Schmidt
LE QUELLEC Olivier
Luka Kalinovcic
Luke Cwik
Marcin Kuthan
masahitojp
Masato Nakamura
Matt Casters
Melissa Pashniak
Michael Li
Miguel Hernandez
Moritz Mack
mosche
nancyxu123
Nathan J Mehl
Niel Markwick
Ning Kang
Pablo Estrada
paul-tlh
Pavel Avilov
Rahul Iyer
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Ryan Skraba
Ryan Thompson
Sam Whittle
Seth Vargo
sp029619
Steven Niemitz
Thiago Nunes
Udi Meiri
Valentyn Tymofieiev
Victor
vitaly.terentyev
Yichi Zhang
Yi Hu
yirutang
Zachary Houfek
Zoe