-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Regression] Profile no longer accepts staging_bucket
param in execution_config
#743
Comments
staging_bucket
param in execution_config
Hi @dataders have you managed to test/ replicate this bug? |
Hey @nickozilla! In which previous version was this working for you, such that it stopped working in v1.5.1? I'm not sure exactly what's up here, so I'm just going to provide some context (which may already be obvious to you) & share some educated guesses. As far I can tell, as we're doing is applying whatever config the user (you) are passing in here, and trying to validate it with dbt-bigquery/dbt/adapters/bigquery/python_submissions.py Lines 167 to 186 in 030324b
This was implemented in #578, and included in v1.5.0. In older versions (v1.3 + v1.4), I don't think we allowed user configuration for these properties at all. I don't think we've made any changes since. I agree that, based on the docs, it seems like the config you're supplying matches the expected schema. Could you try running this code apart from dbt, replacing the from google.cloud import dataproc_v1
from google.protobuf.json_format import ParseDict
batch = dataproc_v1.Batch(
{
"runtime_config": dataproc_v1.RuntimeConfig(
version="1.1",
properties={
"spark.executor.instances": "2",
},
)
}
)
your_config = """
environment_config:
execution_config:
service_account: "{{ env_var('SERVICE_ACCOUNT') }}"
subnetwork_uri: "{{ env_var('SUBNET') }}"
staging_bucket: "{{ env_var('PYTHON_DBT_STAGING_BUCKET') }}"
pyspark_batch:
jar_file_uris:
[
"gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.29.0.jar",
]
runtime_config:
container_image: "europe-docker.pkg.dev/{{ env_var('PYTHON_DBT_CONTAINER') }}"
"""
config_dict = yaml.safe_load(your_config)
ParseDict(config_dict, batch._pb) That There have been several new releases of |
Closing this issue now as we haven't seen it for a while & cannot reproduce it in our current environment, our versions are on
Thanks for looking into this at the time @jtcohen6 |
Is this a regression in a recent version of dbt-bigquery?
Current Behavior
When running a python model, I am unable to use the profile parameter
staging_bucket
which lives under:& presents this error:
As far as I can tell this is still the place where this parameter should live - https://github.com/googleapis/python-dataproc/blob/main/google/cloud/dataproc_v1/types/shared.py#L222
This was first noticed today after I built a new container with
dbt-core==1.5.1
so I'm not sure if the issue is due to that, or a back ported fix ontodbt-bigquery==1.5.1
Expected/Previous Behavior
Previously the same profile was correctly parsing the parameter and using the staging_bucket correctly. The default behaviour doesnt work for my use case, as the service account that runs dbt in dataproc is only directly permissioned against the specific bucket.
Steps To Reproduce
Dependencies:
dbt-core==1.5.1
dbt-bigquery==1.5.1
Profile:
& then run any python model with
dbt run --select python_model
Relevant log output
No response
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: