-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add transient configs #777
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Fixes | ||
body: Make python models use transient config | ||
time: 2023-09-18T10:57:21.113134+12:00 | ||
custom: | ||
Author: jeremyyeo | ||
Issue: "776" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
{% macro snowflake__create_table_as(temporary, relation, compiled_code, language='sql') -%} | ||
{%- set transient = config.get('transient', default=true) -%} | ||
{%- if language == 'sql' -%} | ||
{%- set transient = config.get('transient', default=true) -%} | ||
{%- set cluster_by_keys = config.get('cluster_by', default=none) -%} | ||
{%- set enable_automatic_clustering = config.get('automatic_clustering', default=false) -%} | ||
{%- set copy_grants = config.get('copy_grants', default=false) -%} | ||
|
@@ -46,7 +46,8 @@ | |
{%- endif -%} | ||
|
||
{%- elif language == 'python' -%} | ||
{{ py_write_table(compiled_code=compiled_code, target_relation=relation, temporary=temporary) }} | ||
{%- set table_type = 'transient' if transient else '' -%} | ||
{{ py_write_table(compiled_code=compiled_code, target_relation=relation, temporary=temporary, table_type=table_type) }} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Change {%- set table_type = 'transient' if transient else '' -%}
{{ py_write_table(compiled_code=compiled_code, target_relation=relation, temporary=temporary, table_type=table_type) }} to {{ py_write_table(compiled_code=compiled_code, target_relation=relation, temporary=temporary, transient=transient) }} and handle within There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would do a few things here in alignment with my comment above. I would update the definition of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dbeatty10 I think this still aligns with what you're saying above, correct? We're effectively forcing the use of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mikealfare there's one more thing to handle here:
Which brings us to three-valued logic ...There's three-valued logic to handle for the
How 'bout this?So I think we'd need handle the null case first to be sure the {% if transient is none %}
{% set table_type = none %}
{% elif transient %}
{% set table_type = "transient" %}
{% else %}
{% set table_type = "permanent" %}
{% endif %}
{{ py_write_table(compiled_code=compiled_code, target_relation=relation, table_type=table_type) }} This is assuming an implementation of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, sharp eyes -- you read that right! After examining the logic for So I flipped-flopped from our earlier discussions and switched the implementation to standardize on the behavior of Specifically, I just re-factored the code so that this logic applies to both Here is the relevant portion of the diff: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, but then if someone calls the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mikealfare Currently failing CI, but draft PR is up here: #802 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I put a comment there. You're missing the close braces on the sets, so the code quality failed. Will we be moving forward with 802 then as the primary PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, let's consider #802 the primary PR. It preserves @jeremyyeo's commits history and authorship credit. |
||
{%- else -%} | ||
{% do exceptions.raise_compiler_error("snowflake__create_table_as macro didn't get supported language, it got %s" % language) %} | ||
{%- endif -%} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather pass
transient
as a boolean into this macro and then handle the translation here for the call todf.write.mode()
; the translation is specific to this call and we assume this is a boolean in the user config. Right now we have the translation in thecreate
macro, which maps to"transient"
or""
. Also, is empty string correct here? What happens if we submit a call todf.write.mode()
withtable_type=""
(versus omitting the kwargtable_type
entirely)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikealfare that's a wise question 🧠
I didn't test this out personally, but the comments in the snowpark source code describe that case here.
So the way it is implemented in this PR currently, the string value of
table_type
just becomes a pass-through in line 55 below:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Snowflake docs (and what Doug linked to):
So effectively:
Could swap it out to (according to Mike's suggestion):
Not too sure if it's worth writing additional logic to omit (or not) the
table_type
arg when it is an empty string (or not) - i.e.Happy to follow a suggested pattern @mikealfare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ftr, the docs linked above point out that the kwarg
create_temp_table
is deprecated:Which of course we still use in our
save_as_table()
method calls - so potentially would be a good time to replace that withtable_type
kwarg entirely.Go from (status quo):
Or (this PR's original intended change):
To just:
And then - we'll have all types via:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, just getting around to plowing through my GH notifications. All of this context helps a lot.
Given that SF is deprecating
create_temp_table
in favor oftable_type
, I agree with usingtable_type
in the signature, as a string. And I would default it to empty string in alignment with the docs. This also keeps config parsing out of a macro that otherwise does not care about the jinja globalconfig
that's floating around.I wouldn't worry about removing the
table_type
argument iftable_type
is not in the call. However, I think we need to be cognizant of backwards compatibility for thetemporary
argument in thepy_write_table
macro. So that means we need to take both arguments in thepy_write_table
macro and combine them. @dbeatty10, correct me if I'm wrong here, but I think that amounts to something like this:Then we update the call to
save_as_table
to excludecreate_temp_table
in alignment with your third option above:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds about right @mikealfare.
Logic
To support both backwards-compatibility as well as forward-facing use cases, I'd suggest that the new
table_type
parameter takes precedence overtemporary
(but only when it is specified).So like this (in alignment with the Snowflake docs here):
table_type
when specified"temporary"
as the table type only whentemporary
isTrue
andtable_type
is not specified"transient"
if all else failse.g., something similar to what you wrote, but with the precedence flipped:
An untested implementation
Unnecessarily complicated?
This code might appear unnecessarily complicated on first blush. But it's a four-fold consequence of:
table_type
when a consumer uses the 3-parameter version of thepy_write_table
macro.py_write_table
macro."transient"
as the default value (unless overridden)table_type
are:temp
,temporary
, andtransient
. An empty string means to create a permanent table."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, but I have one suggestion.
Since we're mimicking the behavior of
py_write_table
and we want folks to usetable_type
moving forward, we want to supporttable_type="transient"
,table_type="temp"
, andtable_type="temporary"
. This logic does actually do that, but I read it multiple times before I realized that happens because we're updating the parameter that comes in, hence it passes straight through. For the sake of readability, especially because it's jinja, I'd like to add an else clause to the if block that just contains the comment "otherwise leave table_type as it is" (or something along those lines). It would save folks some time in the future.I completely misread the docs. I conflated this with the default value of "transient". It's kind of wild that empty string is a valid value that means something and is also not the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wild indeed! 🤠
Your suggestion works for me 👍
Here's what it would look like after adding that piece: