v2.40.0
We are happy to present the new 2.40.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this
release.
For more information on changes in 2.40.0 check out the detailed release notes.
Highlights
- Added RunInference API, a framework agnostic transform for inference. With this release, PyTorch and Scikit-learn are supported by the transform.
See also example at apache_beam/examples/inference/pytorch_image_classification.py
I/Os
- Upgraded to Hive 3.1.3 for HCatalogIO. Users can still provide their own version of Hive. (Java) (Issue-19554).
New Features / Improvements
- Go SDK users can now use generic registration functions to optimize their DoFn execution. (BEAM-14347)
- Go SDK users may now write self-checkpointing Splittable DoFns to read from streaming sources. (BEAM-11104)
- Go SDK textio Reads have been moved to Splittable DoFns exclusively. (BEAM-14489)
- Pipeline drain support added for Go SDK has now been tested. (BEAM-11106)
- Go SDK users can now see heap usage, sideinput cache stats, and active process bundle stats in Worker Status. (BEAM-13829)
- The serialization (pickling) library for Python is dill==0.3.1.1 (BEAM-11167)
Breaking Changes
- The Go Sdk now requires a minimum version of 1.18 in order to support generics (BEAM-14347).
- synthetic.SourceConfig field types have changed to int64 from int for better compatibility with Flink's use of Logical types in Schemas (Go) (BEAM-14173)
- Default coder updated to compress sources used with
BoundedSourceAsSDFWrapperFn
andUnboundedSourceAsSDFWrapper
.
Bugfixes
- Fixed X (Java/Python) (BEAM-X).
- Fixed Java expansion service to allow specific files to stage (BEAM-14160).
- Fixed Elasticsearch connection when using both ssl and username/password (Java) (BEAM-14000)
Detailed list of PRs
- [BEAM-14048] [CdapIO] Add ConfigWrapper for building CDAP PluginConfigs by @Amar3tto in #17051
- [BEAM-14196] add test verifying output watermark propagation in bundle by @je-ik in #17504
- Move master readme.md to 2.40.0 by @y1chi in #17552
- [BEAM-14173] Fix Go Loadtests on Dataflow & partial fix for Flink by @lostluck in #17554
- Upgrade python sdk container requirements. by @y1chi in #17549
- [BEAM-11205] Update Libraries BOM dependencies to version 25.2.0 by @benWize in #17497
- [BEAM-12603] Add retry on grpc data channel and remove retry from test. by @y1chi in #17537
- [BEAM-14303] Add a way to exclude output timestamp watermark holds by @reuvenlax in #17359
- [BEAM-14347] Allow users to optimize DoFn execution with a single generic registration function by @damccorm in #17429
- [BEAM-5878] Add (failing) kwonly-argument test by @TheNeuralBit in #17509
- [BEAM-14014] Add parameter for service account impersonation in GCP credentials by @kennknowles in #17394
- [BEAM-14370] [Website] Add new page about appache beam by @bullet03 in #17490
- [BEAM-1754] Adds experimental Typescript Beam SDK by @robertwb in #17341
- [BEAM-14059] Delete tags.go by @damccorm in #17541
- [BEAM-14332] Refactored cluster management for Flink on Dataproc by @kevingg in #17402
- [BEAM-14146] Python Streaming job failing to drain with BigQueryIO write errors by @ihji in #17566
- [BEAM-13988] Update mtime to use time.UnixMilli() calls by @jrmccluskey in #17578
- Fixing patching error on missing dependencies by @pabloem in #17564
- [BEAM-14383] Improve "FailedRows" errors returned by beam.io.WriteToBigQuery by @Firlej in #17517
- Quote pip extra package names in quickstart by @kynx in #17450
- [BEAM-14374] Fix module import error in FullyQualifiedNamedTransform by @ihji in #17482
- [BEAM-14436] Adds code reviewers for GCP I/O connectors and KafkaIO to Beam OWNERS files by @chamikaramj in #17581
- [BEAM-13666] Stuck inventory jobs should be cancelled and rescheduled for next run by @elink21 in #17582
- [BEAM-14439] [BEAM-12673] Add extra details to PubSub matcher errors by @yeandy in #17586
- [BEAM-14423] Add exception injection tests for BigtableIO read in BigtableIOTest by @Abacn in #17559
- [BEAM-11104] Allow self-checkpointing SDFs to return without finishing their restriction by @jrmccluskey in #17558
- [BEAM-14415] Exception handling tests for BQIO streaming inserts in Python by @pabloem in #17544
- BEAM-14413 add Kafka exception test cases by @johnjcasey in #17565
- [BEAM-14417] Adding exception handling tests for JdbcIO.Write by @pabloem in #17555
- [BEAM-14433] Improve Go split error message. by @lostluck in #17575
- [BEAM-14429] Force java load test on dataflow runner v2 forceNumIniti… by @y1chi in #17576
- [BEAM-14435] Adding exception handling tests for SpannerIO write transform by @pabloem in #17577
- [BEAM-14347] Add generic registration functions for iters and emitters by @damccorm in #17574
- [BEAM-14169] Add Credentials rotation cron job for clusters by @elink21 in #17383
- [BEAM-14347] Add generic registration for Combiners by @damccorm in #17579
- [BEAM-12918] TPC-DS: add Jenkins jobs by @aromanenko-dev in #15679
- [BEAM-14448] add datastore test by @johnjcasey in #17592
- [BEAM-14423] Add test cases for BigtableIO.BigtableWriterFn fails due to writeRecord by @Abacn in #17593
- [BEAM-14429] Fix SyntheticUnboundedSource data duplication with SDF wrapper by @y1chi in #17600
- [BEAM-14447] Revert "Merge pull request #17517 from [BEAM-14383] Improve "FailedRo… by @pabloem in #17601
- [BEAM-14347] Rename registration package to register by @damccorm in #17603
- [BEAM-11104] Add self-checkpointing integration test by @jrmccluskey in #17590
- [BEAM-5492] Python Dataflow integration tests should export the pipeline console output to Jenkins Test Result section by @andoni-guzman in #17530
- [BEAM-14396] Bump httplib2 upper bound. by @tvalentyn in #17602
- [BEAM-11104] Add Go self-checkpointing to CHANGES.md by @jrmccluskey in #17612
- [BEAM-14081] [CdapIO] Add context classes for CDAP plugins by @Krasavinigor in #17104
- [BEAM-12526] Add Dependabot by @damccorm in #17563
- [BEAM-14096] bump junit-quickcheck to 1.0 by @masahitojp in #17519
- Remove python 3.6 postcommit from mass_comment.py by @y1chi in #17630
- [BEAM-14347] Add some benchmarks for generic registration by @damccorm in #17613
- [BEAM-12526] Correctly route go dependency changes to go label by @damccorm in #17632
- [BEAM-13695] Add jamm jvm options to Java 11 by @kileys in #17178
- [BEAM-14334] Fix leakage of SparkContext in Spark runner tests to remove forkEvery 1 by @mosche in #17406
- Typo & link update in typescript SDK readme by @lostluck in #17633
- [BEAM-12526] Trigger go precommits on go mod/sum changes by @damccorm in #17636
- Revert "[BEAM-14429] Force java load test on dataflow runner v2 forceNumIniti…" by @y1chi in #17609
- [BEAM-14442] Add GitHub issue templates by @damccorm in #17588
- [BEAM-14347] Add generic registration feature to CHANGES by @damccorm in #17643
- Better test assertion. by @robertwb in #17551
- Bump github.com/google/go-cmp from 0.5.7 to 0.5.8 in /sdks by @dependabot in #17628
- Bump github.com/testcontainers/testcontainers-go from 0.12.0 to 0.13.0 in /sdks by @dependabot in #17627
- Bump github.com/lib/pq from 1.10.4 to 1.10.5 in /sdks by @dependabot in #17626
- [BEAM-14415] Exception handling tests and logging for partial failure BQIO by @pabloem in #17584
- Bump cloud.google.com/go/pubsub from 1.18.0 to 1.21.1 in /sdks by @dependabot in #17646
- [BEAM-14312] [Website] change section order, move socials to footer by @bullet03 in #17408
- Bump cloud.google.com/go/bigquery from 1.28.0 to 1.32.0 in /sdks by @dependabot in #17625
- Updates CHANGES.md to include some recently discovered known issues by @chamikaramj in #17631
- [BEAM-14347] Add function for simple function registration by @damccorm in #17650
- Minor: Drop dataclasses requirement, we only support python 3.7+ by @TheNeuralBit in #17640
- Bump github.com/spf13/cobra from 1.3.0 to 1.4.0 in /sdks by @dependabot in #17647
- [BEAM-14465] Reduce DefaultS3ClientBuilderFactory logging to debug level by @jrmccluskey in #17645
- Revert "Better test assertion." by @tvalentyn in #17653
- [BEAM-14430] Adding a logical type support for Python callables to Row schema by @ihji in #17608
- [BEAM-12482] Update Schema Destination during Bigquery load job when using temporary tables using zeroloadjob by @MarcoRob in #17365
- [BEAM-14455] Add UUID to sub-schemas for PythonExternalTransform by @ihji in #17605
- [BEAM-14014] Support impersonation credentials in dataflow runner by @ryanthompson591 in #17244
- [BEAM-14469] Allow nil primary returns from TrySplit() in a single-windowed context by @jrmccluskey in #17667
- Add some auto-starting runners to the typescript SDK. by @robertwb in #17580
- [BEAM-14371] (and BEAM-14372) - enable a couple staticchecks by @damccorm in #17670
- [BEAM-14470] Use Generic Registrations in loadtests. by @lostluck in #17673
- [BEAM-13015] Update the SDK harness grouping table to be memory bounded based upon the amount of assigned cache memory and to use an LRU eviction policy. by @lukecwik in #17327
- [BEAM-13982] Added output of logging for python E2E pytests by @ryanthompson591 in #17637
- [BEAM-14473] Throw error if using globally windowed, unbounded side input by @jrmccluskey in #17681
- [BEAM-14440] Add basic fuzz tests to the coders package by @jrmccluskey in #17587
- [BEAM-14035] Convert BigQuery SchemaIO to SchemaTransform by @damondouglas in #17607
- Add Akvelon to case-studies by @bullet03 in #17611
- BEAM-12356 Close DatasetService leaked with getTable by @baeminbo in #17520
- Adding eslint and lint configuration to TypeScript SDK by @pcoet in #17676
- [BEAM-14411] Re-enable TypecodersTest, fix most issues by @TheNeuralBit in #17547
- [BEAM-14460] [Playground] WIP. Fix error during getting the graph for java SDK. by @vchunikhin in #17678
- [BEAM-14334] Remove remaining forkEvery 1 from all Spark tests and stop mixing unit tests with runner validations. by @mosche in #17662
- [BEAM-14035] Fix checkstyle issue by @aromanenko-dev in #17692
- [BEAM-14441] Automatically assign issue labels based on responses to template by @damccorm in #17661
- README update for the Docker Error 255 during Website launch on Apple Silicon by @nausharipov in #17456
- [BEAM-12000] Update programming-guide.md by @tvalentyn in #17679
- [BEAM-14467] Fix bug where run_pytest.sh does not elevate errors raised in no_xdist tests by @TheNeuralBit in #17687
- [BEAM-14474] Suppress 'Mean of empty slice' Runtime Warning in dataframe unit test by @Abacn in #17682
- [BEAM-10529] update KafkaIO Xlang integration test to publish and receive null keys by @johnjcasey in #17319
- Fix a few small linting bugs by @damccorm in #17695
- Bump github.com/lib/pq from 1.10.5 to 1.10.6 in /sdks by @dependabot in #17691
- [BEAM-14428] I/O, community, and contribute pages improvements by @bullet03 in #17572
- fixed typos in README.md by @vikash2310 in #17714
- Update the PTransform and associated APIs to be less class-based. by @robertwb in #17699
- Vortex performance improvement: Enable multiple stream clients per worker by @prodriguezdefino in #17550
- [BEAM-14488] Alias async flags. by @lostluck in #17711
- [BEAM-14487] Make drain & update terminal states. by @lostluck in #17710
- Corrects I/O connectors availability status in Beam Website by @chamikaramj in #17707
- [BEAM-14484] Improve behavior surrounding primary roots in self-checkpointing by @jrmccluskey in #17716
- Improve validation error message by @damccorm in #17719
- Remove unused validation configurations. by @y1chi in #17705
- Minor: Bump Dataflow container versions by @TheNeuralBit in #17684
- [BEAM-14418] added arrows to slider by @bullet03 in #17722
- Bump google.golang.org/grpc from 1.45.0 to 1.46.2 in /sdks by @dependabot in #17677
- Add labels for typescript PRs by @Abacn in #17702
- [BEAM-13015] Only create a TimerBundleTracker if there are timers. by @scwhittle in #17445
- Add clarification on Filter transform's input function to pydoc. by @tomstepp in #17704
- [BEAM-14367]Flaky timeout in StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer by @AnandInguva in #17569
- [BEAM-14494] Tag RC container version to have form ${RELEASE}rc${RC_NUM} by @y1chi in #17725
- [BEAM-11578] Fix TypeError in dataflow_metrics has 0 distribution sum by @Abacn in #17706
- [BEAM-14499] Step global, unbounded side input case back to warning by @jrmccluskey in #17735
- [BEAM-14484] Step back unexpected primary handling to warnings by @jrmccluskey in #17724
- [BEAM-14486] Document pubsubio & fix its behavior. by @lostluck in #17709
- [BEAM-14489] Remove non-SDF version of TextIO. by @lostluck in #17712
- [BEAM-14298] resolve dependency org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde by @Abacn in #17734
- Fix
-= 1
vs--
linting issue by @damccorm in #17738 - [BEAM-12308] change expected value in kakfa IT by @johnjcasey in #17740
- Fix 'NoneType' object has no attribute error in bigquery_test.py by @ihji in #17746
- [BEAM-14053] [CdapIO] Add wrapper class for CDAP plugin by @ktttnv in #17150
- [BEAM-14471] Adding testcases and examples for xlang Python DataframeTransform by @ihji in #17674
- [BEAM-14129] Clean up PubsubLiteIO by removing options that no longer apply by @dpcollins-google in #17169
- [BEAM-14496] Ensure that precombine is inheriting one of the timestamps output values by @lukecwik in #17729
- [BEAM-14139] Remove unused Flink 1.11 directory by @ibzib in #17750
- [BEAM-14044] Allow ModelLoader to forward BatchElements args by @zwestrick in #17527
- [BEAM-14481] Remove unnecessary context by @TheNeuralBit in #17737
- [BEAM-9324] Fix incompatibility of direct runner with cython by @Abacn in #17728
- [BEAM-14503] Add support for Flink 1.15 by @jto in #17739
- Update Beam website to release 2.39.0 by @y1chi in #17690
- [BEAM-14509] Add several flags to dataflow runner by @damccorm in #17752
- [BEAM-14494] Fix publish_docker_images.sh by @y1chi in #17756
- [BEAM-14426] Allow skipping of any output when writing an empty PCollection. by @robertwb in #17568
- Bump cloud.google.com/go/storage from 1.22.0 to 1.22.1 in /sdks by @dependabot in #17720
- Fix 2.38.0 download page. by @y1chi in #17759
- [BEAM-14492] add flinkConfDir to FlinkPipelineOptions by @je-ik in #17715
- [BEAM-14336] Re-enable
flight_delays_it_test
withapache-beam-testing
dataset by @TheNeuralBit in #17758 - [BEAM-11106] small nits to truncate sdf exec unit by @riteshghorse in #17755
- Add standard logging when exception is thrown by @alejandrorm in #17717
- [BEAM-13829] Enable worker status in Go by @riteshghorse in #17768
- [BEAM-14519] Add website page for Go dependencies by @damccorm in #17766
- [BEAM-11106] Validate that DoFn returns Process continuation when Truncating by @riteshghorse in #17770
- [BEAM-14505] Add Dataflow streaming pipeline update support to the Go SDK by @jrmccluskey in #17747
- Handle invalid rows in the Storage Api sink by @reuvenlax in #17423
- Bump google.golang.org/api from 0.76.0 to 0.81.0 in /sdks by @dependabot in #17751
- [BEAM-14478] Fix missing 'projectId' attribute error by @ihji in #17688
- BEAM-14419: Remove invalid mod type by @thiagotnunes in #17556
- [BEAM-14006] Update Python katas to 2.38 and fix issue with one test by @iht in #17634
- [BEAM-13972] Update documentation for run inference by @ryanthompson591 in #17508
- [BEAM-14502] Fix: Splitting scans into smaller chunks to buffer reads by @diegomez17 in #16939
- Update beam-master version for legacy by @kileys in #17760
- [BEAM-14218] Add resource location hints to base inference runner. by @ryanthompson591 in #17448
- Fix NonType error when importing google.api_core fails by @ihji in #17774
- [BEAM-14442] Ask for repro steps/redirect to user list in bug template by @damccorm in #17642
- [BEAM-14166] Performance improvements for RowWithGetter by @mosche in #17172
- minor: don't capture stderr in kata tests by @iasoon in #17639
- cleaned up TypeScript in coders.ts by @pcoet in #17689
- [BEAM-14170] - Create a test that runs sickbayed tests by @fernando-wizeline in #17471
- [BEAM-14255] Drop clock abstraction by @ryanthompson591 in #17671
- Adds repr to NullableCoder by @zwestrick in #17757
- [BEAM-14475] add test cases to GcsUtil by @johnjcasey in #17683
- [BEAM-14410] Add test to demonstrate BEAM-14410 issue in non-cython environments by @TheNeuralBit in #17548
- [BEAM-14449] Support cluster provisioning when using Flink on Dataproc by @kevingg in #17736
- [BEAM-14483] Adds Java cross-language transforms for invoking Python Map and FlatMap by @chamikaramj in #17696
- [BEAM-14527] Implement "Beam Summit 2022" banner by @miamihotline in #17776
- [BEAM-12164] Feat: Add new restriction tracker to be able to track partition state along with timestamp for change streams connector. by @nancyxu123 in #17222
- [BEAM-14451] Support export to BigQuery in FhirIO.Export by @lnogueir in #17598
- Add typing information to RunInferrence. by @robertwb in #17762
- [BEAM-14513] Add read transform and initial healthcare client by @lnogueir in #17748
- [BEAM-14536] Handle 0.0 splits in offsetrange restriction by @damccorm in #17782
- [BEAM-14470] Use lifecycle method names directly. by @lostluck in #17790
- [BEAM-14297] add nullable annotations and an integration test by @johnjcasey in #17742
- Only generate Javadocs for latest Spark runner version (Spark 3) by @mosche in #17793
- [BEAM-13984] followup Fix precommit due to pytorch_test gcs model by @Abacn in #17795
- Fail Javadoc aggregateJavadoc task if there's an error by @y1chi in #17801
- [BEAM-10608] Fix additional nullness errors in BigQueryIO by @kennknowles in #16721
- [BEAM-14510] adding exception tests to LocalFileSystem by @ahmedabu98 in #17753
- feat: allow for unknown values in change streams by @thiagotnunes in #17655
- Support JdbcIO autosharding in Python by @pabloem in #16921
- [BEAM-14511] Growable Tracker for Go SDK by @riteshghorse in #17754
- [BEAM-14539] Ensure that the print stream can handle larger byte arrays being written and also allow for a growable amount of carry over. by @lukecwik in #17787
- [BEAM-10976] Fix bug with bundle finalization on SDFs (and a small doc bug) by @damccorm in #17811
- Bump google.golang.org/grpc from 1.46.2 to 1.47.0 in /sdks by @dependabot in #17806
- Rename pytorch files by @yeandy in #17798
- [BEAM-14446] Update some docs to point to GitHub issues by @damccorm in #17594
- [BEAM-13945] (FIX) Update Java BQ connector to support new JSON type by @ahmedabu98 in #17492
- [BEAM-11105] Add more watermark estimation docs for go by @damccorm in #17785
- [BEAM-11106] documentation for SDF truncation in Go by @riteshghorse in #17781
- [BEAM-11167] Updates dill package to version 0.3.5.1 by @ryanthompson591 in #17669
- [BEAM-6258] Use gRPC 1.33.1 as min version to ensure that we pickup keepalive fix by @lukecwik in #17777
- [BEAM-14441] Enable GitHub issues by @damccorm in #17812
- Alias worker_harness_container_image to sdk_container_image by @damccorm in #17817
- [BEAM-14546] Fix errant pass for empty collections in Count by @jrmccluskey in #17813
- Revert "Merge pull request #17492 from [BEAM-13945] (FIX) Update Java… by @pabloem in #18038
- [BEAM-14504] Add support for including addittional parameters to executebundle method in fhirio. by @fbeevikm in #17741
- [BEAM-13945] Roll forward JSON support for BQIO by @pabloem in #18374
- [BEAM-13756] [Playground] Merge Log and Output tabs into one and add there filtering by @miamihotline in #17792
- [BEAM-14529] Add integer to float64 conversion support by @yirutang in #17779
- [BEAM-14556] Honor the formatter installed on the root handler. by @lukecwik in #17820
- [Fixes #18679] Ensure that usage of metrics on a template job reports an error by @lukecwik in #18905
- Clean up uses of == instead of === in ts sdk by @damccorm in #17732
- Update Jira -> Issues in the Readme by @damccorm in #21718
- Add an option to run Python operations in-line when invoked as a remote runner. by @robertwb in #19266
- Populate missing display data for remotely expanded transforms. by @robertwb in #19267
- Mount GCP credentials in local docker environments. by @robertwb in #19265
- [BEAM-14471] Fix PytestUnknownMarkingWarning by @Abacn in #17825
- [BEAM-14068]Add Pytorch inference IT test and example by @AnandInguva in #17462
- [Playground] [Hotfix] Remove autoscrolling from embedded editor by @miamihotline in #21717
- [BEAM-12918] Add PostCommit_Java_Tpcds_Dataflow job by @aromanenko-dev in #17680
- [BEAM-12554] Create new instances of FileSink in sink_fn by @Abacn in #17708
- [BEAM-14121] Fix SpannerIO service call metrics and improve tests. by @nielm in #17335
- DataflowRunner: Experiment added to disable unbounded PCcollection checks turning batch into streaming by @nbali in #16773
- [BEAM-14337] Support batched key examples and non-batchable kwargs params for RunInference models by @yeandy in #21733
- More flexible Python Callable type. by @robertwb in #17767
- fixed typos in README by @pcoet in #17675
- Fix for increased FAILED_PRECONDITION errors in BQ Read API. by @vachan-shetty in #21739
- Bump google.golang.org/api from 0.81.0 to 0.83.0 in /sdks by @dependabot in #21743
- Add ability to self-assign issues for non-committers by @damccorm in #21719
- Dont try to generate jiras as part of dependency report by @damccorm in #21753
- Allow users to comment
.take-issue
without taking by @damccorm in #21755 - Gather metrics on GH Issues by @damccorm in #21736
- [Beam-14528]: Add ISO time format support for Timestamp, Date, DateTime, Time field. by @yirutang in #17778
- Update all links to in progress jiras to issues by @damccorm in #21749
- [BEAM-14000] Fixes Elastic search IO doesnot work when both password and keystore are used by @nishantjain91 in #17297
- Exclude gcp packages from dependabot by @damccorm in #21746
- Update dashboards to use gh issues data instead of jira data by @damccorm in #21771
- Better cross langauge support for dataframe reads. by @robertwb in #21762
- Add template_location flag to Go Dataflow runner by @jrmccluskey in #21774
- [BEAM-14406] Drain test for SDF truncation in Go SDK by @riteshghorse in #17814
- More Jira -> Issues doc updates by @damccorm in #21770
- [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing by @jrmccluskey in #17956
- [BEAM-13769]Add no_xdist marker for cloudpickle test by @AnandInguva in #17538
- [BEAM-14533] Bump cloudpickle to 2.1.0 by @deadwind4 in #17780
- Add basic byte size estimation for batches by @TheNeuralBit in #17771
- Add @yields_batches and @yields_elements by @TheNeuralBit in #19268
- [BEAM-14535] Added support for pandas in sklearn inference runner by @ryanthompson591 in #17800
- Merge ModelLoader and InferenceRunner into same class. by @robertwb in #21795
- [BEAM-14422] Exception testing for ReadFromBigQuery by @ahmedabu98 in #17589
- Add README for image classification example by @yeandy in #21758
- Replace SklearnModelLoader with SklearnModelHandler by @AnandInguva in #21805
- Fix every PR linking to PR 123 by @Abacn in #21802
- Add native PubSub IO prototype to Go by @jrmccluskey in #17955
- Allow creation of dynamically defined transforms in the Python expansion service. by @robertwb in #17822
- Make keying of examples explicit. by @robertwb in #21777
- Refactor code according to keyedModelHandler changes by @AnandInguva in #21819
- Add RunInference API to CHANGES.md by @yeandy in #21754
- Do not allow postcommit jobs phrase triggering by @kennknowles in #21821
- Refactor API code to base.py in RunInference by @AnandInguva in #21801
- Go SDK: Improve error message when a filesystem scheme is not found by @gonzojive in #21816
- Bump cloud.google.com/go/pubsub from 1.21.1 to 1.22.2 in /sdks by @dependabot in #21716
- Disable more comments triggers by @kileys in #21823
- [BEAM-14532] Add integration testing to fhirio Read transform by @lnogueir in #17803
- Stop collecting jira metrics by @damccorm in #21775
- [#21252] Enforce pubsub message publishing limits in the python SDK by @scwhittle in #17794
- Separated pandas and numpy implementations of sklearn. by @ryanthompson591 in #21803
- Composite triggers and unit tests for Go SDK by @riteshghorse in #21756
- Enable phrase trigger for a few post commits by @kileys in #21846
- [BEAM-14557] Read and Seek Runner Capabilities in Go SDK by @riteshghorse in #17821
- [BEAM-13806] Add x-lang BigQuery IO integration test to Go SDK. by @youngoli in #16818
- [BEAM-14265] Add watermark hold for all timers by @je-ik in #17809
- Bump Python beam-master container by @ryanthompson591 in #21820
- Split PytorchModelHandler into PytorchModelHandlerTensor and PytorchModelHandlerKeyedTensor by @yeandy in #21810
- Fix Hadoop Downloader Range not correct by @Abacn in #21778
- [BEAM-14036] Read Configuration for Pub/Sub SchemaTransform by @damondouglas in #17730
- [Go SDK] Add more info to Worker Status API by @riteshghorse in #21776
- Make PeriodicImpulse generates unbounded PCollection by @Abacn in #21815
- [BEAM-14267] Update watchForNewFiles to allow watching updated files by @Abacn in #17305
- [BEAM-14541]: fix Cloud Datastore Timestamp value conversion by @yixiaoshen in #17789
- Update references to Jira to GH for the Go SDK by @damccorm in #21830
- [#21853] Adjust Go cross-compile to target entire package by @youngoli in #21854
- [BEAM-14553] Add destination coder to FileResultCoder components by @y1chi in #17818
- Add transform names to help debug flaky test by @nielm in #21745
- copyedited README for RunInference examples by @pcoet in #21855
- Automatically enable Runner v2 for pipelines that use cross-language transforms. by @chamikaramj in #21788
- Document and test overriding batch type inference by @TheNeuralBit in #21844
- Update references to Jira to GH for the Python SDK by @damccorm in #21831
- Adjust Jenkins configuration to allow more memory per JVM (fixes #20819) by @kennknowles in #21858
- Updates Changes.md to reflect Go changes for the release by @riteshghorse in #21865
- [21794 ] Fix output timestamp in Dataflow. by @reuvenlax in #21793
- Adding more info to the sdk_worker_parallelism description by @rszper in #21839
- Add Bert Language Modeling example by @yeandy in #21818
- [BEAM-14524] Returning NamedTuple from RunInference transform by @ihji in #17773
- Unit tests for RunInference keyed/unkeyed Modelhandler and examples by @AnandInguva in #21856
- [BEAM-13229] side nav bug fixed by @bullet03 in #17731
- [Website] Fix links for pipelines by @bullet03 in #21744
- Remove kwargs and add explicit runinference_args by @yeandy in #21806
- Modify README for 3 pytorch examples by @yeandy in #21871
- Sickbay Pytorch example IT test by @AnandInguva in #21857
- Mark issues as triaged when they are assigned by @damccorm in #21790
- Add required=True to Pytorch image classification example by @yeandy in #21883
- convert windmill min timestamp to beam min timestamp by @Naireen in #21740
- Switch go todos from issue # syntax to links by @damccorm in #21890
- Add Pytorch image segmentation example by @yeandy in #21766
- Add README documentation for scikit-learn MNIST example by @yeandy in #21887
- Decompose labels for new issues by @damccorm in #21888
- Use Go 1.18 for go-licenses by @damccorm in #21896
- [BEAM-12903] Cron job to cleanup Dataproc leaked resources by @elink21 in #21779
- [BEAM-7209][BEAM-9351][BEAM-9428] Upgrade Hive to version 3.1.3 by @Abacn in #17749
- Sharding IO tests (Kafka, Debezium, JDBC, Kinesis, Neo4j) from the javaPostCommit task by @benWize in #21804
- [BEAM-14315] Match updated files continuously by @Abacn in #17604
- Sklearn Mnist example and IT test by @AnandInguva in #21781
- Add Spanner Integration tests to verify exception handling by @nielm in #21748
- Get the latest version of go-licenses by @damccorm in #21901
- Hide internal helpers added to DoFn for batched DoFns by @TheNeuralBit in #21860
- Updated documentation for ml.inference docs. by @ryanthompson591 in #21868
- [cherry-pick][release-2.40.0] Merge pull request #21910: Revert "convert windmill min timestamp to … by @pabloem in #21917
- [cherry-pick][release-2.40.0] Rollback Dill from 3.5.1 to 3.1.1 by @pabloem in #21911
- [cherry-pick][release-2.40.0] Update Python base image requirements by @pabloem in #21913
- [cherry-pick][release-2.40.0] Merge pull request #21895: Drops usage of setWindowingStrategyInterna… by @pabloem in #21914
- [cherry-pick][release-2.40.0][Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders by @pabloem in #21936
- [cherry-pick][release-2.40.0] BigQueryIO: Adding the BASIC view setting to getTable request (#21879) by @pabloem in #21938
- [cherry-pick][release-2.40.0][21941] Fix no output timestamp case by @pabloem in #21944
New Contributors
- @kynx made their first contribution in #17450
- @elink21 made their first contribution in #17582
- @Krasavinigor made their first contribution in #17104
- @vchunikhin made their first contribution in #17678
- @nausharipov made their first contribution in #17456
- @vikash2310 made their first contribution in #17714
- @tomstepp made their first contribution in #17704
- @ktttnv made their first contribution in #17150
- @zwestrick made their first contribution in #17527
- @alejandrorm made their first contribution in #17717
- @diegomez17 made their first contribution in #16939
- @lnogueir made their first contribution in #17598
- @nishantjain91 made their first contribution in #17297
- @gonzojive made their first contribution in #21816
- @yixiaoshen made their first contribution in #17789
- @Naireen made their first contribution in #21740
Full Changelog: v2.39.0...v2.40.0-RC1
What's Changed
- [BEAM-14048] [CdapIO] Add ConfigWrapper for building CDAP PluginConfigs by @Amar3tto in #17051
- [BEAM-14196] add test verifying output watermark propagation in bundle by @je-ik in #17504
- Move master readme.md to 2.40.0 by @y1chi in #17552
- [BEAM-14173] Fix Go Loadtests on Dataflow & partial fix for Flink by @lostluck in #17554
- Upgrade python sdk container requirements. by @y1chi in #17549
- [BEAM-11205] Update Libraries BOM dependencies to version 25.2.0 by @benWize in #17497
- [BEAM-12603] Add retry on grpc data channel and remove retry from test. by @y1chi in #17537
- [BEAM-14303] Add a way to exclude output timestamp watermark holds by @reuvenlax in #17359
- [BEAM-14347] Allow users to optimize DoFn execution with a single generic registration function by @damccorm in #17429
- [BEAM-5878] Add (failing) kwonly-argument test by @TheNeuralBit in #17509
- [BEAM-14014] Add parameter for service account impersonation in GCP credentials by @kennknowles in #17394
- [BEAM-14370] [Website] Add new page about appache beam by @bullet03 in #17490
- [BEAM-1754] Adds experimental Typescript Beam SDK by @robertwb in #17341
- [BEAM-14059] Delete tags.go by @damccorm in #17541
- [BEAM-14332] Refactored cluster management for Flink on Dataproc by @kevingg in #17402
- [BEAM-14146] Python Streaming job failing to drain with BigQueryIO write errors by @ihji in #17566
- [BEAM-13988] Update mtime to use time.UnixMilli() calls by @jrmccluskey in #17578
- Fixing patching error on missing dependencies by @pabloem in #17564
- [BEAM-14383] Improve "FailedRows" errors returned by beam.io.WriteToBigQuery by @Firlej in #17517
- Quote pip extra package names in quickstart by @kynx in #17450
- [BEAM-14374] Fix module import error in FullyQualifiedNamedTransform by @ihji in #17482
- [BEAM-14436] Adds code reviewers for GCP I/O connectors and KafkaIO to Beam OWNERS files by @chamikaramj in #17581
- [BEAM-13666] Stuck inventory jobs should be cancelled and rescheduled for next run by @elink21 in #17582
- [BEAM-14439] [BEAM-12673] Add extra details to PubSub matcher errors by @yeandy in #17586
- [BEAM-14423] Add exception injection tests for BigtableIO read in BigtableIOTest by @Abacn in #17559
- [BEAM-11104] Allow self-checkpointing SDFs to return without finishing their restriction by @jrmccluskey in #17558
- [BEAM-14415] Exception handling tests for BQIO streaming inserts in Python by @pabloem in #17544
- BEAM-14413 add Kafka exception test cases by @johnjcasey in #17565
- [BEAM-14417] Adding exception handling tests for JdbcIO.Write by @pabloem in #17555
- [BEAM-14433] Improve Go split error message. by @lostluck in #17575
- [BEAM-14429] Force java load test on dataflow runner v2 forceNumIniti… by @y1chi in #17576
- [BEAM-14435] Adding exception handling tests for SpannerIO write transform by @pabloem in #17577
- [BEAM-14347] Add generic registration functions for iters and emitters by @damccorm in #17574
- [BEAM-14169] Add Credentials rotation cron job for clusters by @elink21 in #17383
- [BEAM-14347] Add generic registration for Combiners by @damccorm in #17579
- [BEAM-12918] TPC-DS: add Jenkins jobs by @aromanenko-dev in #15679
- [BEAM-14448] add datastore test by @johnjcasey in #17592
- [BEAM-14423] Add test cases for BigtableIO.BigtableWriterFn fails due to writeRecord by @Abacn in #17593
- [BEAM-14429] Fix SyntheticUnboundedSource data duplication with SDF wrapper by @y1chi in #17600
- [BEAM-14447] Revert "Merge pull request #17517 from [BEAM-14383] Improve "FailedRo… by @pabloem in #17601
- [BEAM-14347] Rename registration package to register by @damccorm in #17603
- [BEAM-11104] Add self-checkpointing integration test by @jrmccluskey in #17590
- [BEAM-5492] Python Dataflow integration tests should export the pipeline console output to Jenkins Test Result section by @andoni-guzman in #17530
- [BEAM-14396] Bump httplib2 upper bound. by @tvalentyn in #17602
- [BEAM-11104] Add Go self-checkpointing to CHANGES.md by @jrmccluskey in #17612
- [BEAM-14081] [CdapIO] Add context classes for CDAP plugins by @Krasavinigor in #17104
- [BEAM-12526] Add Dependabot by @damccorm in #17563
- [BEAM-14096] bump junit-quickcheck to 1.0 by @masahitojp in #17519
- Remove python 3.6 postcommit from mass_comment.py by @y1chi in #17630
- [BEAM-14347] Add some benchmarks for generic registration by @damccorm in #17613
- [BEAM-12526] Correctly route go dependency changes to go label by @damccorm in #17632
- [BEAM-13695] Add jamm jvm options to Java 11 by @kileys in #17178
- [BEAM-14334] Fix leakage of SparkContext in Spark runner tests to remove forkEvery 1 by @mosche in #17406
- Typo & link update in typescript SDK readme by @lostluck in #17633
- [BEAM-12526] Trigger go precommits on go mod/sum changes by @damccorm in #17636
- Revert "[BEAM-14429] Force java load test on dataflow runner v2 forceNumIniti…" by @y1chi in #17609
- [BEAM-14442] Add GitHub issue templates by @damccorm in #17588
- [BEAM-14347] Add generic registration feature to CHANGES by @damccorm in #17643
- Better test assertion. by @robertwb in #17551
- Bump github.com/google/go-cmp from 0.5.7 to 0.5.8 in /sdks by @dependabot in #17628
- Bump github.com/testcontainers/testcontainers-go from 0.12.0 to 0.13.0 in /sdks by @dependabot in #17627
- Bump github.com/lib/pq from 1.10.4 to 1.10.5 in /sdks by @dependabot in #17626
- [BEAM-14415] Exception handling tests and logging for partial failure BQIO by @pabloem in #17584
- Bump cloud.google.com/go/pubsub from 1.18.0 to 1.21.1 in /sdks by @dependabot in #17646
- [BEAM-14312] [Website] change section order, move socials to footer by @bullet03 in #17408
- Bump cloud.google.com/go/bigquery from 1.28.0 to 1.32.0 in /sdks by @dependabot in #17625
- Updates CHANGES.md to include some recently discovered known issues by @chamikaramj in #17631
- [BEAM-14347] Add function for simple function registration by @damccorm in #17650
- Minor: Drop dataclasses requirement, we only support python 3.7+ by @TheNeuralBit in #17640
- Bump github.com/spf13/cobra from 1.3.0 to 1.4.0 in /sdks by @dependabot in #17647
- [BEAM-14465] Reduce DefaultS3ClientBuilderFactory logging to debug level by @jrmccluskey in #17645
- Revert "Better test assertion." by @tvalentyn in #17653
- [BEAM-14430] Adding a logical type support for Python callables to Row schema by @ihji in #17608
- [BEAM-12482] Update Schema Destination during Bigquery load job when using temporary tables using zeroloadjob by @MarcoRob in #17365
- [BEAM-14455] Add UUID to sub-schemas for PythonExternalTransform by @ihji in #17605
- [BEAM-14014] Support impersonation credentials in dataflow runner by @ryanthompson591 in #17244
- [BEAM-14469] Allow nil primary returns from TrySplit() in a single-windowed context by @jrmccluskey in #17667
- Add some auto-starting runners to the typescript SDK. by @robertwb in #17580
- [BEAM-14371] (and BEAM-14372) - enable a couple staticchecks by @damccorm in #17670
- [BEAM-14470] Use Generic Registrations in loadtests. by @lostluck in #17673
- [BEAM-13015] Update the SDK harness grouping table to be memory bounded based upon the amount of assigned cache memory and to use an LRU eviction policy. by @lukecwik in #17327
- [BEAM-13982] Added output of logging for python E2E pytests by @ryanthompson591 in #17637
- [BEAM-14473] Throw error if using globally windowed, unbounded side input by @jrmccluskey in #17681
- [BEAM-14440] Add basic fuzz tests to the coders package by @jrmccluskey in #17587
- [BEAM-14035] Convert BigQuery SchemaIO to SchemaTransform by @damondouglas in #17607
- Add Akvelon to case-studies by @bullet03 in #17611
- BEAM-12356 Close DatasetService leaked with getTable by @baeminbo in #17520
- Adding eslint and lint configuration to TypeScript SDK by @pcoet in #17676
- [BEAM-14411] Re-enable TypecodersTest, fix most issues by @TheNeuralBit in #17547
- [BEAM-14460] [Playground] WIP. Fix error during getting the graph for java SDK. by @vchunikhin in #17678
- [BEAM-14334] Remove remaining forkEvery 1 from all Spark tests and stop mixing unit tests with runner validations. by @mosche in #17662
- [BEAM-14035] Fix checkstyle issue by @aromanenko-dev in #17692
- [BEAM-14441] Automatically assign issue labels based on responses to template by @damccorm in #17661
- README update for the Docker Error 255 during Website launch on Apple Silicon by @nausharipov in #17456
- [BEAM-12000] Update programming-guide.md by @tvalentyn in #17679
- [BEAM-14467] Fix bug where run_pytest.sh does not elevate errors raised in no_xdist tests by @TheNeuralBit in #17687
- [BEAM-14474] Suppress 'Mean of empty slice' Runtime Warning in dataframe unit test by @Abacn in #17682
- [BEAM-10529] update KafkaIO Xlang integration test to publish and receive null keys by @johnjcasey in #17319
- Fix a few small linting bugs by @damccorm in #17695
- Bump github.com/lib/pq from 1.10.5 to 1.10.6 in /sdks by @dependabot in #17691
- [BEAM-14428] I/O, community, and contribute pages improvements by @bullet03 in #17572
- fixed typos in README.md by @vikash2310 in #17714
- Update the PTransform and associated APIs to be less class-based. by @robertwb in #17699
- Vortex performance improvement: Enable multiple stream clients per worker by @prodriguezdefino in #17550
- [BEAM-14488] Alias async flags. by @lostluck in #17711
- [BEAM-14487] Make drain & update terminal states. by @lostluck in #17710
- Corrects I/O connectors availability status in Beam Website by @chamikaramj in #17707
- [BEAM-14484] Improve behavior surrounding primary roots in self-checkpointing by @jrmccluskey in #17716
- Improve validation error message by @damccorm in #17719
- Remove unused validation configurations. by @y1chi in #17705
- Minor: Bump Dataflow container versions by @TheNeuralBit in #17684
- [BEAM-14418] added arrows to slider by @bullet03 in #17722
- Bump google.golang.org/grpc from 1.45.0 to 1.46.2 in /sdks by @dependabot in #17677
- Add labels for typescript PRs by @Abacn in #17702
- [BEAM-13015] Only create a TimerBundleTracker if there are timers. by @scwhittle in #17445
- Add clarification on Filter transform's input function to pydoc. by @tomstepp in #17704
- [BEAM-14367]Flaky timeout in StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer by @AnandInguva in #17569
- [BEAM-14494] Tag RC container version to have form ${RELEASE}rc${RC_NUM} by @y1chi in #17725
- [BEAM-11578] Fix TypeError in dataflow_metrics has 0 distribution sum by @Abacn in #17706
- [BEAM-14499] Step global, unbounded side input case back to warning by @jrmccluskey in #17735
- [BEAM-14484] Step back unexpected primary handling to warnings by @jrmccluskey in #17724
- [BEAM-14486] Document pubsubio & fix its behavior. by @lostluck in #17709
- [BEAM-14489] Remove non-SDF version of TextIO. by @lostluck in #17712
- [BEAM-14298] resolve dependency org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde by @Abacn in #17734
- Fix
-= 1
vs--
linting issue by @damccorm in #17738 - [BEAM-12308] change expected value in kakfa IT by @johnjcasey in #17740
- Fix 'NoneType' object has no attribute error in bigquery_test.py by @ihji in #17746
- [BEAM-14053] [CdapIO] Add wrapper class for CDAP plugin by @ktttnv in #17150
- [BEAM-14471] Adding testcases and examples for xlang Python DataframeTransform by @ihji in #17674
- [BEAM-14129] Clean up PubsubLiteIO by removing options that no longer apply by @dpcollins-google in #17169
- [BEAM-14496] Ensure that precombine is inheriting one of the timestamps output values by @lukecwik in #17729
- [BEAM-14139] Remove unused Flink 1.11 directory by @ibzib in #17750
- [BEAM-14044] Allow ModelLoader to forward BatchElements args by @zwestrick in #17527
- [BEAM-14481] Remove unnecessary context by @TheNeuralBit in #17737
- [BEAM-9324] Fix incompatibility of direct runner with cython by @Abacn in #17728
- [BEAM-14503] Add support for Flink 1.15 by @jto in #17739
- Update Beam website to release 2.39.0 by @y1chi in #17690
- [BEAM-14509] Add several flags to dataflow runner by @damccorm in #17752
- [BEAM-14494] Fix publish_docker_images.sh by @y1chi in #17756
- [BEAM-14426] Allow skipping of any output when writing an empty PCollection. by @robertwb in #17568
- Bump cloud.google.com/go/storage from 1.22.0 to 1.22.1 in /sdks by @dependabot in #17720
- Fix 2.38.0 download page. by @y1chi in #17759
- [BEAM-14492] add flinkConfDir to FlinkPipelineOptions by @je-ik in #17715
- [BEAM-14336] Re-enable
flight_delays_it_test
withapache-beam-testing
dataset by @TheNeuralBit in #17758 - [BEAM-11106] small nits to truncate sdf exec unit by @riteshghorse in #17755
- Add standard logging when exception is thrown by @alejandrorm in #17717
- [BEAM-13829] Enable worker status in Go by @riteshghorse in #17768
- [BEAM-14519] Add website page for Go dependencies by @damccorm in #17766
- [BEAM-11106] Validate that DoFn returns Process continuation when Truncating by @riteshghorse in #17770
- [BEAM-14505] Add Dataflow streaming pipeline update support to the Go SDK by @jrmccluskey in #17747
- Handle invalid rows in the Storage Api sink by @reuvenlax in #17423
- Bump google.golang.org/api from 0.76.0 to 0.81.0 in /sdks by @dependabot in #17751
- [BEAM-14478] Fix missing 'projectId' attribute error by @ihji in #17688
- BEAM-14419: Remove invalid mod type by @thiagotnunes in #17556
- [BEAM-14006] Update Python katas to 2.38 and fix issue with one test by @iht in #17634
- [BEAM-13972] Update documentation for run inference by @ryanthompson591 in #17508
- [BEAM-14502] Fix: Splitting scans into smaller chunks to buffer reads by @diegomez17 in #16939
- Update beam-master version for legacy by @kileys in #17760
- [BEAM-14218] Add resource location hints to base inference runner. by @ryanthompson591 in #17448
- Fix NonType error when importing google.api_core fails by @ihji in #17774
- [BEAM-14442] Ask for repro steps/redirect to user list in bug template by @damccorm in #17642
- [BEAM-14166] Performance improvements for RowWithGetter by @mosche in #17172
- minor: don't capture stderr in kata tests by @iasoon in #17639
- cleaned up TypeScript in coders.ts by @pcoet in #17689
- [BEAM-14170] - Create a test that runs sickbayed tests by @fernando-wizeline in #17471
- [BEAM-14255] Drop clock abstraction by @ryanthompson591 in #17671
- Adds repr to NullableCoder by @zwestrick in #17757
- [BEAM-14475] add test cases to GcsUtil by @johnjcasey in #17683
- [BEAM-14410] Add test to demonstrate BEAM-14410 issue in non-cython environments by @TheNeuralBit in #17548
- [BEAM-14449] Support cluster provisioning when using Flink on Dataproc by @kevingg in #17736
- [BEAM-14483] Adds Java cross-language transforms for invoking Python Map and FlatMap by @chamikaramj in #17696
- [BEAM-14527] Implement "Beam Summit 2022" banner by @miamihotline in #17776
- [BEAM-12164] Feat: Add new restriction tracker to be able to track partition state along with timestamp for change streams connector. by @nancyxu123 in #17222
- [BEAM-14451] Support export to BigQuery in FhirIO.Export by @lnogueir in #17598
- Add typing information to RunInferrence. by @robertwb in #17762
- [BEAM-14513] Add read transform and initial healthcare client by @lnogueir in #17748
- [BEAM-14536] Handle 0.0 splits in offsetrange restriction by @damccorm in #17782
- [BEAM-14470] Use lifecycle method names directly. by @lostluck in #17790
- [BEAM-14297] add nullable annotations and an integration test by @johnjcasey in #17742
- Only generate Javadocs for latest Spark runner version (Spark 3) by @mosche in #17793
- [BEAM-13984] followup Fix precommit due to pytorch_test gcs model by @Abacn in #17795
- Fail Javadoc aggregateJavadoc task if there's an error by @y1chi in #17801
- [BEAM-10608] Fix additional nullness errors in BigQueryIO by @kennknowles in #16721
- [BEAM-14510] adding exception tests to LocalFileSystem by @ahmedabu98 in #17753
- feat: allow for unknown values in change streams by @thiagotnunes in #17655
- Support JdbcIO autosharding in Python by @pabloem in #16921
- [BEAM-14511] Growable Tracker for Go SDK by @riteshghorse in #17754
- [BEAM-14539] Ensure that the print stream can handle larger byte arrays being written and also allow for a growable amount of carry over. by @lukecwik in #17787
- [BEAM-10976] Fix bug with bundle finalization on SDFs (and a small doc bug) by @damccorm in #17811
- Bump google.golang.org/grpc from 1.46.2 to 1.47.0 in /sdks by @dependabot in #17806
- Rename pytorch files by @yeandy in #17798
- [BEAM-14446] Update some docs to point to GitHub issues by @damccorm in #17594
- [BEAM-13945] (FIX) Update Java BQ connector to support new JSON type by @ahmedabu98 in #17492
- [BEAM-11105] Add more watermark estimation docs for go by @damccorm in #17785
- [BEAM-11106] documentation for SDF truncation in Go by @riteshghorse in #17781
- [BEAM-11167] Updates dill package to version 0.3.5.1 by @ryanthompson591 in #17669
- [BEAM-6258] Use gRPC 1.33.1 as min version to ensure that we pickup keepalive fix by @lukecwik in #17777
- [BEAM-14441] Enable GitHub issues by @damccorm in #17812
- Alias worker_harness_container_image to sdk_container_image by @damccorm in #17817
- [BEAM-14546] Fix errant pass for empty collections in Count by @jrmccluskey in #17813
- Revert "Merge pull request #17492 from [BEAM-13945] (FIX) Update Java… by @pabloem in #18038
- [BEAM-14504] Add support for including addittional parameters to executebundle method in fhirio. by @fbeevikm in #17741
- [BEAM-13945] Roll forward JSON support for BQIO by @pabloem in #18374
- [BEAM-13756] [Playground] Merge Log and Output tabs into one and add there filtering by @miamihotline in #17792
- [BEAM-14529] Add integer to float64 conversion support by @yirutang in #17779
- [BEAM-14556] Honor the formatter installed on the root handler. by @lukecwik in #17820
- [Fixes #18679] Ensure that usage of metrics on a template job reports an error by @lukecwik in #18905
- Clean up uses of == instead of === in ts sdk by @damccorm in #17732
- Update Jira -> Issues in the Readme by @damccorm in #21718
- Add an option to run Python operations in-line when invoked as a remote runner. by @robertwb in #19266
- Populate missing display data for remotely expanded transforms. by @robertwb in #19267
- Mount GCP credentials in local docker environments. by @robertwb in #19265
- [BEAM-14471] Fix PytestUnknownMarkingWarning by @Abacn in #17825
- [BEAM-14068]Add Pytorch inference IT test and example by @AnandInguva in #17462
- [Playground] [Hotfix] Remove autoscrolling from embedded editor by @miamihotline in #21717
- [BEAM-12918] Add PostCommit_Java_Tpcds_Dataflow job by @aromanenko-dev in #17680
- [BEAM-12554] Create new instances of FileSink in sink_fn by @Abacn in #17708
- [BEAM-14121] Fix SpannerIO service call metrics and improve tests. by @nielm in #17335
- DataflowRunner: Experiment added to disable unbounded PCcollection checks turning batch into streaming by @nbali in #16773
- [BEAM-14337] Support batched key examples and non-batchable kwargs params for RunInference models by @yeandy in #21733
- More flexible Python Callable type. by @robertwb in #17767
- fixed typos in README by @pcoet in #17675
- Fix for increased FAILED_PRECONDITION errors in BQ Read API. by @vachan-shetty in #21739
- Bump google.golang.org/api from 0.81.0 to 0.83.0 in /sdks by @dependabot in #21743
- Add ability to self-assign issues for non-committers by @damccorm in #21719
- Dont try to generate jiras as part of dependency report by @damccorm in #21753
- Allow users to comment
.take-issue
without taking by @damccorm in #21755 - Gather metrics on GH Issues by @damccorm in #21736
- [Beam-14528]: Add ISO time format support for Timestamp, Date, DateTime, Time field. by @yirutang in #17778
- Update all links to in progress jiras to issues by @damccorm in #21749
- [BEAM-14000] Fixes Elastic search IO doesnot work when both password and keystore are used by @nishantjain91 in #17297
- Exclude gcp packages from dependabot by @damccorm in #21746
- Update dashboards to use gh issues data instead of jira data by @damccorm in #21771
- Better cross langauge support for dataframe reads. by @robertwb in #21762
- Add template_location flag to Go Dataflow runner by @jrmccluskey in #21774
- [BEAM-14406] Drain test for SDF truncation in Go SDK by @riteshghorse in #17814
- More Jira -> Issues doc updates by @damccorm in #21770
- [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing by @jrmccluskey in #17956
- [BEAM-13769]Add no_xdist marker for cloudpickle test by @AnandInguva in #17538
- [BEAM-14533] Bump cloudpickle to 2.1.0 by @deadwind4 in #17780
- Add basic byte size estimation for batches by @TheNeuralBit in #17771
- Add @yields_batches and @yields_elements by @TheNeuralBit in #19268
- [BEAM-14535] Added support for pandas in sklearn inference runner by @ryanthompson591 in #17800
- Merge ModelLoader and InferenceRunner into same class. by @robertwb in #21795
- [BEAM-14422] Exception testing for ReadFromBigQuery by @ahmedabu98 in #17589
- Add README for image classification example by @yeandy in #21758
- Replace SklearnModelLoader with SklearnModelHandler by @AnandInguva in #21805
- Fix every PR linking to PR 123 by @Abacn in #21802
- Add native PubSub IO prototype to Go by @jrmccluskey in #17955
- Allow creation of dynamically defined transforms in the Python expansion service. by @robertwb in #17822
- Make keying of examples explicit. by @robertwb in #21777
- Refactor code according to keyedModelHandler changes by @AnandInguva in #21819
- Add RunInference API to CHANGES.md by @yeandy in #21754
- Do not allow postcommit jobs phrase triggering by @kennknowles in #21821
- Refactor API code to base.py in RunInference by @AnandInguva in #21801
- Go SDK: Improve error message when a filesystem scheme is not found by @gonzojive in #21816
- Bump cloud.google.com/go/pubsub from 1.21.1 to 1.22.2 in /sdks by @dependabot in #21716
- Disable more comments triggers by @kileys in #21823
- [BEAM-14532] Add integration testing to fhirio Read transform by @lnogueir in #17803
- Stop collecting jira metrics by @damccorm in #21775
- [#21252] Enforce pubsub message publishing limits in the python SDK by @scwhittle in #17794
- Separated pandas and numpy implementations of sklearn. by @ryanthompson591 in #21803
- Composite triggers and unit tests for Go SDK by @riteshghorse in #21756
- Enable phrase trigger for a few post commits by @kileys in #21846
- [BEAM-14557] Read and Seek Runner Capabilities in Go SDK by @riteshghorse in #17821
- [BEAM-13806] Add x-lang BigQuery IO integration test to Go SDK. by @youngoli in #16818
- [BEAM-14265] Add watermark hold for all timers by @je-ik in #17809
- Bump Python beam-master container by @ryanthompson591 in #21820
- Split PytorchModelHandler into PytorchModelHandlerTensor and PytorchModelHandlerKeyedTensor by @yeandy in #21810
- Fix Hadoop Downloader Range not correct by @Abacn in #21778
- [BEAM-14036] Read Configuration for Pub/Sub SchemaTransform by @damondouglas in #17730
- [Go SDK] Add more info to Worker Status API by @riteshghorse in #21776
- Make PeriodicImpulse generates unbounded PCollection by @Abacn in #21815
- [BEAM-14267] Update watchForNewFiles to allow watching updated files by @Abacn in #17305
- [BEAM-14541]: fix Cloud Datastore Timestamp value conversion by @yixiaoshen in #17789
- Update references to Jira to GH for the Go SDK by @damccorm in #21830
- [#21853] Adjust Go cross-compile to target entire package by @youngoli in #21854
- [BEAM-14553] Add destination coder to FileResultCoder components by @y1chi in #17818
- Add transform names to help debug flaky test by @nielm in #21745
- copyedited README for RunInference examples by @pcoet in #21855
- Automatically enable Runner v2 for pipelines that use cross-language transforms. by @chamikaramj in #21788
- Document and test overriding batch type inference by @TheNeuralBit in #21844
- Update references to Jira to GH for the Python SDK by @damccorm in #21831
- Adjust Jenkins configuration to allow more memory per JVM (fixes #20819) by @kennknowles in #21858
- Updates Changes.md to reflect Go changes for the release by @riteshghorse in #21865
- [21794 ] Fix output timestamp in Dataflow. by @reuvenlax in #21793
- Adding more info to the sdk_worker_parallelism description by @rszper in #21839
- Add Bert Language Modeling example by @yeandy in #21818
- [BEAM-14524] Returning NamedTuple from RunInference transform by @ihji in #17773
- Unit tests for RunInference keyed/unkeyed Modelhandler and examples by @AnandInguva in #21856
- [BEAM-13229] side nav bug fixed by @bullet03 in #17731
- [Website] Fix links for pipelines by @bullet03 in #21744
- Remove kwargs and add explicit runinference_args by @yeandy in #21806
- Modify README for 3 pytorch examples by @yeandy in #21871
- Sickbay Pytorch example IT test by @AnandInguva in #21857
- Mark issues as triaged when they are assigned by @damccorm in #21790
- Add required=True to Pytorch image classification example by @yeandy in #21883
- convert windmill min timestamp to beam min timestamp by @Naireen in #21740
- Switch go todos from issue # syntax to links by @damccorm in #21890
- Add Pytorch image segmentation example by @yeandy in #21766
- Add README documentation for scikit-learn MNIST example by @yeandy in #21887
- Decompose labels for new issues by @damccorm in #21888
- Use Go 1.18 for go-licenses by @damccorm in #21896
- [BEAM-12903] Cron job to cleanup Dataproc leaked resources by @elink21 in #21779
- [BEAM-7209][BEAM-9351][BEAM-9428] Upgrade Hive to version 3.1.3 by @Abacn in #17749
- Sharding IO tests (Kafka, Debezium, JDBC, Kinesis, Neo4j) from the javaPostCommit task by @benWize in #21804
- [BEAM-14315] Match updated files continuously by @Abacn in #17604
- Sklearn Mnist example and IT test by @AnandInguva in #21781
- Add Spanner Integration tests to verify exception handling by @nielm in #21748
- Get the latest version of go-licenses by @damccorm in #21901
- Hide internal helpers added to DoFn for batched DoFns by @TheNeuralBit in #21860
- Updated documentation for ml.inference docs. by @ryanthompson591 in #21868
- [cherry-pick][release-2.40.0] Merge pull request #21910: Revert "convert windmill min timestamp to … by @pabloem in #21917
- [cherry-pick][release-2.40.0] Rollback Dill from 3.5.1 to 3.1.1 by @pabloem in #21911
- [cherry-pick][release-2.40.0] Update Python base image requirements by @pabloem in #21913
- [cherry-pick][release-2.40.0] Merge pull request #21895: Drops usage of setWindowingStrategyInterna… by @pabloem in #21914
- [cherry-pick][release-2.40.0][Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders by @pabloem in #21936
- [cherry-pick][release-2.40.0] BigQueryIO: Adding the BASIC view setting to getTable request (#21879) by @pabloem in #21938
- [cherry-pick][release-2.40.0][Fixes #21941] Fix no output timestamp case by @pabloem in #21944
- [release-2.40.0] Fix FlatMap numpy array bug by @TheNeuralBit in #22007
New Contributors
- @kynx made their first contribution in #17450
- @elink21 made their first contribution in #17582
- @Krasavinigor made their first contribution in #17104
- @vchunikhin made their first contribution in #17678
- @nausharipov made their first contribution in #17456
- @vikash2310 made their first contribution in #17714
- @tomstepp made their first contribution in #17704
- @ktttnv made their first contribution in #17150
- @zwestrick made their first contribution in #17527
- @alejandrorm made their first contribution in #17717
- @diegomez17 made their first contribution in #16939
- @nishantjain91 made their first contribution in #17297
- @gonzojive made their first contribution in #21816
- @yixiaoshen made their first contribution in #17789
- @Naireen made their first contribution in #21740
Full Changelog: v2.39.0...v2.40.0-RC2