diff --git a/CHANGES.md b/CHANGES.md index c5c74325722f..979cbbd67329 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -132,6 +132,7 @@ * (Java) Fixed tearDown not invoked when DoFn throws on Portable Runners ([#18592](https://github.com/apache/beam/issues/18592), [#31381](https://github.com/apache/beam/issues/31381)). * (Java) Fixed protobuf error with MapState.remove() in Dataflow Streaming Java Legacy Runner without Streaming Engine ([#32892](https://github.com/apache/beam/issues/32892)). * Adding flag to support conditionally disabling auto-commit in JdbcIO ReadFn ([#31111](https://github.com/apache/beam/issues/31111)) +* (Python) Fixed BigQuery Enrichment bug that can lead to multiple conditions returning duplicate rows, batching returning incorrect results and conditions not scoped by row during batching ([#32780](https://github.com/apache/beam/pull/32780)). ## Security Fixes * Fixed (CVE-YYYY-NNNN)[https://www.cve.org/CVERecord?id=CVE-YYYY-NNNN] (Java/Python/Go) ([#X](https://github.com/apache/beam/issues/X)). @@ -187,6 +188,13 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192)) * (Java, Python, Go) Fixed PeriodicSequence backlog bytes reporting, which was preventing Dataflow Runner autoscaling from functioning properly ([#32506](https://github.com/apache/beam/issues/32506)). * (Java) Fix improper decoding of rows with schemas containing nullable fields when encoded with a schema with equal encoding positions but modified field order. ([#32388](https://github.com/apache/beam/issues/32388)). +## Known Issues + +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. + # [2.59.0] - 2024-09-11 ## Highlights @@ -230,6 +238,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192)) * If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know. * Java file-based IOs read or write lots (100k+) files could experience slowness and/or broken metrics visualization on Dataflow UI [#32649](https://github.com/apache/beam/issues/32649). +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. # [2.58.1] - 2024-08-15 @@ -241,6 +253,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192)) * Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)). * Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer. +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. # [2.58.0] - 2024-08-06 @@ -272,6 +288,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192)) * Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)). * Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer. * [KafkaIO] Records read with `ReadFromKafkaViaSDF` are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, ([#32196](https://github.com/apache/beam/issues/32196)) +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. # [2.57.0] - 2024-06-26 @@ -328,6 +348,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192)) * Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)). * Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer. +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. # [2.56.0] - 2024-05-01 diff --git a/website/www/site/content/en/blog/beam-2.57.0.md b/website/www/site/content/en/blog/beam-2.57.0.md index b583b4ee3c51..7be75a7891c5 100644 --- a/website/www/site/content/en/blog/beam-2.57.0.md +++ b/website/www/site/content/en/blog/beam-2.57.0.md @@ -79,6 +79,10 @@ For more information on changes in 2.57.0, check out the [detailed release notes ## Known Issues * Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer. +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md diff --git a/website/www/site/content/en/blog/beam-2.58.0.md b/website/www/site/content/en/blog/beam-2.58.0.md index cfdf23c725e0..0d944fd419d4 100644 --- a/website/www/site/content/en/blog/beam-2.58.0.md +++ b/website/www/site/content/en/blog/beam-2.58.0.md @@ -53,6 +53,10 @@ For more information about changes in 2.58.0, check out the [detailed release no * Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer. * [KafkaIO] Records read with `ReadFromKafkaViaSDF` are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, ([#32196](https://github.com/apache/beam/issues/32196)) +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md diff --git a/website/www/site/content/en/blog/beam-2.59.0.md b/website/www/site/content/en/blog/beam-2.59.0.md index 6ce81e7c48eb..846e45916d66 100644 --- a/website/www/site/content/en/blog/beam-2.59.0.md +++ b/website/www/site/content/en/blog/beam-2.59.0.md @@ -66,8 +66,11 @@ For more information on changes in 2.59.0, check out the [detailed release notes * In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features: OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support. * If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know. - * Java file-based IOs read or write lots (100k+) files could experience slowness and/or broken metrics visualization on Dataflow UI [#32649](https://github.com/apache/beam/issues/32649). +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md diff --git a/website/www/site/content/en/blog/beam-2.60.0.md b/website/www/site/content/en/blog/beam-2.60.0.md index 462bdaf16798..ae5e0284ccdd 100644 --- a/website/www/site/content/en/blog/beam-2.60.0.md +++ b/website/www/site/content/en/blog/beam-2.60.0.md @@ -70,7 +70,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192)) ## Known Issues -N/A +* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)): + * Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output. + * Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results. + * Fixed in 2.61.0. For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md