Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Expose some LoadJobConfig attributes for configuration/Allow truncate write mode #3

Open
GrgDev opened this issue Jan 25, 2021 · 7 comments
Labels
enhancement New feature or request
Milestone

Comments

@GrgDev
Copy link

GrgDev commented Jan 25, 2021

Right now, there is no way to customize how writes are done because of the exclusive use of the LoadConfigJob with default attributes.

load_config = LoadJobConfig()

Some of these attributes could be exposed for user configuration.

In my particular use case, I would like to change the write_disposition attribute to work in truncate mode instead of append mode. This way, when using Singer with this target to pipe data to BigQuery in a time partitioned table, if something goes wrong during one of the time periods, I can just rerun the job and overwrite the partitions in question instead of worrying about duplicate data.

https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.LoadJobConfig.html#google.cloud.bigquery.job.LoadJobConfig.write_disposition

@daigotanaka daigotanaka added the enhancement New feature or request label Jan 27, 2021
@daigotanaka
Copy link
Contributor

@daigotanaka
Copy link
Contributor

Probably implement this by creating a config section called
LoadJobConfig and let user populate the key-value as in the API doc then let target-biguqery to pass it as kwargs.

@daigotanaka
Copy link
Contributor

@GrgDev I drafted in #5, but I have not tested this yet.

@daigotanaka
Copy link
Contributor

I finally got a chance to test PR #5

@GrgDev LoadJobConfig params can be passed, but it would not work well with auto table creation. For example, if you
specify clustering_fields with some column name(s) and etc, the upload assumes the table to be created with compatible settings. To make use of LoadJobConfig, you need to make sure the default table created by target-bigquery-partition when the table does not exist is compatible to the LoadJobConfig params you pass on. Otherwise, you will need to prepare the compatible table before running target-bigquery-partition.

Which parameters are you trying to customize?
https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.LoadJobConfig.html

@daigotanaka
Copy link
Contributor

oops, you mentioned in the original post. write_disposition, correct?

@daigotanaka
Copy link
Contributor

Personal Note:

@daigotanaka daigotanaka added this to the 0.1.2 milestone Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants