[❄️ Snowflake Official] snowflake_table_column resource? #1490

sfc-gh-swinkler · 2023-01-23T18:30:24Z

sfc-gh-swinkler
Jan 23, 2023
Maintainer

The snowflake_table resource is one of the most important resources supported by the Terraform provider today, but it is also one of the most difficult to maintain, and prone to subtle errors. One error that I have heard come up again and again is dropping/ re-creating of columns after alterations. For example, If I have a snowflake_table resource with three columns named: id, name and email, and I want to change the name of the name column to something more descriptive such as "first_name", then attempting to do so would result in dropping the name column and recreating a new "customer_name" column, although that is not made clear in the plan. For reference here is the example resource:

resource "snowflake_table" "test_table" {
  name      = "BAR"
  database  = snowflake_database.test_database.name
  schema    = snowflake_schema.test_schema.name
  comment   = "This is a test table"
  column {
    name = "ID"
    type = "NUMBER(38,0)"
  }
  column {
    name = "CUSTOMER_NAME"
    type = "VARCHAR"
  }
  column {
    name = "EMAIL"
    type = "VARCHAR"
  }
}

And the resulting Terraform plan after changing name -> "customer_name":

  # snowflake_table.test_table will be updated in-place
  ~ resource "snowflake_table" "test_table" {
        id                  = "BAR|FOO|BAR"
        name                = "BAR"
        # (7 unchanged attributes hidden)

      ~ column {
          ~ name     = "NAME" -> "CUSTOMER_NAME"
            # (2 unchanged attributes hidden)
        }
        # (2 unchanged blocks hidden)
    }

So far nothing looks bad, this is just saying that Terraform will do an in-place change. But actually the column is being dropped and recreated which could of course lead to loss of real data. When turning debug logs on this becomes more obvious:

[DEBUG] sql-conn-exec [query ALTER TABLE "BAR"."FOO"."BAR" DROP COLUMN "NAME" err <nil> duration 107.440501ms args {}]: timestamp=2023-01-23T09:56:41.298-0800
...
[DEBUG] exec stmt ALTER TABLE "BAR"."FOO"."BAR" ADD COLUMN "CUSTOMER_NAME" VARCHAR(16777216) COMMENT '': timestamp=2023-01-23T09:56:41.298-0800

Clearly this is not acceptable. It is my belief that the reason for errors like this are because the snowflake_table resources is overloaded to handle everything to do with tables, thus making the logic for handling things like updates much more complicated and harder to maintain than it needs to be. For reference, here is the function which calculates diffs for columns: https://github.com/Snowflake-Labs/terraform-provider-snowflake/blob/main/pkg/resources/table.go#L343-L379. For context, as someone who has touched much of the codebase, I might add that this code is highly unusual and not something we would do for new resources.

A few months ago I was charged with fixing a similar problem, having to do with table constraints. What I ended up doing was breaking out the table constraint into its own Terraform resource: https://registry.terraform.io/providers/Snowflake-Labs/snowflake/latest/docs/resources/table_constraint. Not only does this support more options than the old way did, it also is easier to maintain and less error prone. This approach was first popularized by the AWS provider in regards to the S3 bucket resource, which was similarly overloaded.

I suggest we do something similar for table columns. Make a new resource called snowflake_table_column that will handle the logic of creating, altering, dropping and renaming table columns. An example implementation is shown below:

resource "snowflake_table_column" "tc" {
  table_identifier {
    database = "<>"
    schema = "<>"
    name = "<>"
  }
 name = "<>"
 default = {
  expression = ""
  sequence_identifier = {
  database = ""
  schema = ""
  name = ""
}
  expression = ""
 }
nullable = false
masking_policy

}

The only wrinkle is that Snowflake tables cannot be created without at least one column specified. Therefore the recommended approach would be to create a table with just one column (the id column) and all other columns are to be created using the dedicated snowflake_table_column resource. We would not deprecate support for multiple columns in tables at this time. Perhaps when we eventually release a 1.0 provider we can enforce creating only only column through the snowflake_table resource, but that is a ways off.

Would love to hear the thoughts of the community on this one.

danu165 · 2023-01-26T05:54:41Z

danu165
Jan 26, 2023

I think this is a great idea

0 replies

gbatiz · 2023-03-10T15:13:29Z

gbatiz
Mar 10, 2023

Since tables and columns are tightly coupled, I don't think they should be separated.

My suggestion is instead of defining columns using blocks, why not have a columns parameter that takes a map(object( ...column props...)) as its argument, where we specify the above properties for each column as per below?

variable "columns" {
  type = map(object({
    name = string
    type = string
    expression = string
    default = object({
      expression = string
      sequence_identifier = string
    })
    nullable = bool
    masking_policy = any()
    ...
    }
}

This takes care of the chicken and egg problem too. The only important thing is to make it clear in the docs that the keys of the map are "surrogate keys" to identify the column for terrafrom, and the actual column name should be defined inside the value object. This way the column can be tracked across renames.

0 replies

gbatiz · 2023-06-12T12:50:38Z

gbatiz
Jun 12, 2023

@sfc-gh-swinkler Any update or feedback on this?

0 replies

DmitryMaletin · 2023-06-21T07:43:40Z

DmitryMaletin
Jun 21, 2023

Column as a separate resource sounds promising as it can help with multiple issues.
Necessity to have a dummy column to create table is inconvenient but acceptable.
it's sad terraform doesn't have a concept of nested resources

0 replies

sfc-gh-swinkler · 2023-06-22T23:19:21Z

sfc-gh-swinkler
Jun 22, 2023
Maintainer Author

Column as a separate resource sounds promising as it can help with multiple issues.
Necessity to have a dummy column to create table is inconvenient but acceptable.
it's sad terraform doesn't have a concept of nested resources

In the Terraform Plugin Framework there is better support for custom types which may fix this issue without having to create a whole separate resource. We will need to adopt this framework anyways as part of a general refactoring effort, so it is a bit unclear right now whether it makes sense to have this table column as a separate resource or not. I really wish we could create a table with zeros column, as it is awkward to have two ways of doing the same thing.

1 reply

DmitryMaletin Jan 10, 2024

is there any progress or eta?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[❄️ Snowflake Official] snowflake_table_column resource? #1490

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[❄️ Snowflake Official] snowflake_table_column resource? #1490

sfc-gh-swinkler Jan 23, 2023 Maintainer

Replies: 5 comments · 1 reply

danu165 Jan 26, 2023

gbatiz Mar 10, 2023

gbatiz Jun 12, 2023

DmitryMaletin Jun 21, 2023

sfc-gh-swinkler Jun 22, 2023 Maintainer Author

DmitryMaletin Jan 10, 2024

sfc-gh-swinkler
Jan 23, 2023
Maintainer

Replies: 5 comments 1 reply

danu165
Jan 26, 2023

gbatiz
Mar 10, 2023

gbatiz
Jun 12, 2023

DmitryMaletin
Jun 21, 2023

sfc-gh-swinkler
Jun 22, 2023
Maintainer Author