Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BigQuery enrichment bug to changes.md and blogs #33165

Merged
merged 2 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@
* (Java) Fixed tearDown not invoked when DoFn throws on Portable Runners ([#18592](https://github.com/apache/beam/issues/18592), [#31381](https://github.com/apache/beam/issues/31381)).
* (Java) Fixed protobuf error with MapState.remove() in Dataflow Streaming Java Legacy Runner without Streaming Engine ([#32892](https://github.com/apache/beam/issues/32892)).
* Adding flag to support conditionally disabling auto-commit in JdbcIO ReadFn ([#31111](https://github.com/apache/beam/issues/31111))
* (Python) Fixed BigQuery Enrichment bug that can laed to multiple conditions returning duplicate rows, batching returning incorrect results and conditions not scoped by row during batching ([#32780](https://github.com/apache/beam/pull/32780)).
claudevdm marked this conversation as resolved.
Show resolved Hide resolved
* Fixed BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
claudevdm marked this conversation as resolved.
Show resolved Hide resolved

## Security Fixes
* Fixed (CVE-YYYY-NNNN)[https://www.cve.org/CVERecord?id=CVE-YYYY-NNNN] (Java/Python/Go) ([#X](https://github.com/apache/beam/issues/X)).
Expand Down Expand Up @@ -186,6 +190,13 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))
* (Java, Python, Go) Fixed PeriodicSequence backlog bytes reporting, which was preventing Dataflow Runner autoscaling from functioning properly ([#32506](https://github.com/apache/beam/issues/32506)).
* (Java) Fix improper decoding of rows with schemas containing nullable fields when encoded with a schema with equal encoding positions but modified field order. ([#32388](https://github.com/apache/beam/issues/32388)).

## Known Issues

* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.59.0] - 2024-09-11

## Highlights
Expand Down Expand Up @@ -229,6 +240,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))
* If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.

* Java file-based IOs read or write lots (100k+) files could experience slowness and/or broken metrics visualization on Dataflow UI [#32649](https://github.com/apache/beam/issues/32649).
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.58.1] - 2024-08-15

Expand All @@ -240,6 +255,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))

* Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)).
* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.58.0] - 2024-08-06

Expand Down Expand Up @@ -271,6 +290,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))
* Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)).
* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* [KafkaIO] Records read with `ReadFromKafkaViaSDF` are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, ([#32196](https://github.com/apache/beam/issues/32196))
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.57.0] - 2024-06-26

Expand Down Expand Up @@ -327,6 +350,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))

* Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)).
* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.56.0] - 2024-05-01

Expand Down
4 changes: 4 additions & 0 deletions website/www/site/content/en/blog/beam-2.57.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ For more information on changes in 2.57.0, check out the [detailed release notes
## Known Issues

* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

Expand Down
4 changes: 4 additions & 0 deletions website/www/site/content/en/blog/beam-2.58.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ For more information about changes in 2.58.0, check out the [detailed release no

* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* [KafkaIO] Records read with `ReadFromKafkaViaSDF` are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, ([#32196](https://github.com/apache/beam/issues/32196))
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

Expand Down
5 changes: 4 additions & 1 deletion website/www/site/content/en/blog/beam-2.59.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,11 @@ For more information on changes in 2.59.0, check out the [detailed release notes
* In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support.
* If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.

* Java file-based IOs read or write lots (100k+) files could experience slowness and/or broken metrics visualization on Dataflow UI [#32649](https://github.com/apache/beam/issues/32649).
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

Expand Down
5 changes: 4 additions & 1 deletion website/www/site/content/en/blog/beam-2.60.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))

## Known Issues

N/A
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

Expand Down
Loading