Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Java SDK BigQueryIO's RowMutationInformation class is not backward compatible with previous releases #31993

Open
1 of 17 tasks
slilichenko opened this issue Jul 25, 2024 · 1 comment
Assignees

Comments

@slilichenko
Copy link
Contributor

slilichenko commented Jul 25, 2024

What happened?

BigQueryIO's CDC ingestion requires usage of RowMutationInformation class. This class was two pairs of methods to return the change sequence number. The recently deprecated pair, "public static RowMutationInformation of(MutationType mutationType, long sequenceNumber)" and "public abstract Long getSequenceNumber();" are no longer work correctly - sequence number provided in the first method is no longer returned in the second due to this code. This breaks existing pipelines which haven't converted to the newly introduced methods.

Additionally, the new method uses compute intensive checking for the proper formatting of the sequence number. Is it possible that the underlying Storage Write API does the same validation and there is no need to do it twice?

Also, using "checkArgument" function in the pipeline's runtime code can cause a streaming pipeline with a single row with incorrect RowMutationInformation to fail, unless the developer explicitly catches IllegalStateException. it will have to be cancelled and could not be drained.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@ahmedabu98
Copy link
Contributor

CC @damondouglas

@damondouglas damondouglas self-assigned this Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants