Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Java BigQueryIO) Copy job and temp table deletion occurs in the same DoFn, making the operation vulnerable during retries #22920

Closed
ahmedabu98 opened this issue Aug 26, 2022 · 2 comments

Comments

@ahmedabu98
Copy link
Contributor

ahmedabu98 commented Aug 26, 2022

What happened?

During a large FILE_LOAD BQ write, the WriteRename DoFn currently takes care of both the copy job and the deletion of temp tables (the latter happening at finishBundle). This implementation is risky because there is no separation between the copy step's retry and the table deletion step's retry. If an error occurs in the middle and the whole bundle retries, we may attempt to copy from tables that no longer exist.

Example:
Say, we have 10 temp tables and we receive a BQ exception when deleting the 5th table. WriteRename will retry and attempt to copy from all 10, but at this point 4 tables no longer exist. This bundle will continue retrying forever.

Solution:
Look to how temp file deletion is implemented. The two steps (load and delete) are separated into two different DoFns.

Issue Priority

Priority: 2

Issue Component

Component: io-java-gcp

@ahmedabu98 ahmedabu98 changed the title (Java BigQueryIO) Copy jobs and temp table deletion occurs in the same DoFn, making the operation vulnerable during retries (Java BigQueryIO) Copy job and temp table deletion occurs in the same DoFn, making the operation vulnerable during retries Aug 26, 2022
@github-actions github-actions bot added the stale label Oct 31, 2022
@BjornPrime
Copy link
Contributor

.take-issue

@johnjcasey
Copy link
Contributor

this has been fixed by #30023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants