Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Data can be copied into wrong columns #116

Closed
visch opened this issue Sep 12, 2023 · 1 comment · Fixed by #117
Closed

bug: Data can be copied into wrong columns #116

visch opened this issue Sep 12, 2023 · 1 comment · Fixed by #117
Assignees

Comments

@visch
Copy link
Member

visch commented Sep 12, 2023

Because the copy into doesn't specify the column order if the columns list in the schema doesn't match the order in the table in snowflake exactly you can have some issues. I hit this as we were migrating an existing table to use target-snowflake. Related code here https://github.com/MeltanoLabs/target-snowflake/blob/main/target_snowflake/connector.py#L355-L358 , I recommend we swap to explicitly defining the column names in the copy into (https://docs.snowflake.com/en/sql-reference/sql/copy-into-table) statement, maybe there's a better way?

copy into sql example that gets generated by the target

[SQL: copy into RMS_DB.RAW.EXAMPLE_BROKEN from (select $1:"ACCOUNTNAME"::VARCHAR(399) as "ACCOUNTNAME", $1:"ACCOUNTID"::DECIMAL as "ACCOUNTID", $1:"VOID"::BOOLEAN as "VOID", $1:"ETLLASTUPDATEDON"::TIMESTAMP_NTZ as "ETLLASTUPDATEDON", $1:_sdc_extracted_at::TIMESTAMP_NTZ as _sdc_extracted_at, $1:_sdc_received_at::TIMESTAMP_NTZ as _sdc_received_at, $1:_sdc_batched_at::TIMESTAMP_NTZ as _sdc_batched_at, $1:_sdc_deleted_at::TIMESTAMP_NTZ as _sdc_deleted_at, $1:_sdc_sequence::DECIMAL as _sdc_sequence, $1:_sdc_table_version::DECIMAL as _sdc_table_version from '@~/target-snowflake/EXAMPLE_BROKEN-83eb0de6-3121-4230-bcbf-79846ee67ef7')file_format = (format_name='RMS_DB.RAW."EXAMPLE_BROKEN-83eb0de6-3121-4230-bcbf-79846ee67ef7"')]

To reproduce

  1. Run the table setup process here

example_1

{"type": "SCHEMA", "stream": "EXAMPLE_BROKEN", "schema": {"properties": {"ACCOUNTID": {"type": ["number", "null"]}, "ACCOUNTNAME": {"maxLength": 400, "type": ["string", "null"]}, "VOID": {"type": ["boolean", "null"]}, "ETLLASTUPDATEDON": {"format": "date-time", "type": ["string", "null"]}}}, "key_properties": [], "bookmark_properties": []}
{"type": "RECORD", "stream": "EXAMPLE_BROKEN", "record": {"ACCOUNTID": 123456.1, "ACCOUNTNAME": "Name1", "VOID": true, "ETLLASTUPDATEDON": "2023-04-06 07:13:04.380000"}, "time_extracted": "2023-09-11T20:41:59.240392+00:00"}

Run cat example_1 | meltano invoke target-snowflake

  1. (Note we swapped the first column order)
    example_2
{"type": "SCHEMA", "stream": "EXAMPLE_BROKEN", "schema": {"properties": {"ACCOUNTNAME": {"maxLength": 399, "type": ["string", "null"]}, "ACCOUNTID": {"type": ["number", "null"]}, "VOID": {"type": ["boolean", "null"]}, "ETLLASTUPDATEDON": {"format": "date-time", "type": ["string", "null"]}}}, "key_properties": [], "bookmark_properties": []}
{"type": "RECORD", "stream": "EXAMPLE_BROKEN", "record": {"ACCOUNTNAME": "Name1", "ACCOUNTID": 123456.1, "VOID": true, "ETLLASTUPDATEDON": "2023-04-06 07:13:04.380000"}, "time_extracted": "2023-09-11T20:41:59.240392+00:00"}

Run cat example_2 | meltano invoke target-snowflake

You'll get

sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 100038 (22018): 01aeee51-0001-56f3-0004-428e00039da6: Numeric value 'Name1' is not recognized
[SQL: copy into RMS_DB.RAW.EXAMPLE_BROKEN from (select $1:"ACCOUNTNAME"::VARCHAR(399) as "ACCOUNTNAME", $1:"ACCOUNTID"::DECIMAL as "ACCOUNTID", $1:"VOID"::BOOLEAN as "VOID", $1:"ETLLASTUPDATEDON"::TIMESTAMP_NTZ as "ETLLASTUPDATEDON", $1:_sdc_extracted_at::TIMESTAMP_NTZ as _sdc_extracted_at, $1:_sdc_received_at::TIMESTAMP_NTZ as _sdc_received_at, $1:_sdc_batched_at::TIMESTAMP_NTZ as _sdc_batched_at, $1:_sdc_deleted_at::TIMESTAMP_NTZ as _sdc_deleted_at, $1:_sdc_sequence::DECIMAL as _sdc_sequence, $1:_sdc_table_version::DECIMAL as _sdc_table_version from '@~/target-snowflake/EXAMPLE_BROKEN-83eb0de6-3121-4230-bcbf-79846ee67ef7')file_format = (format_name='RMS_DB.RAW."EXAMPLE_BROKEN-83eb0de6-3121-4230-bcbf-79846ee67ef7"')]
@pnadolny13
Copy link
Contributor

@visch this is a good catch - my initial understanding was that since we're using JSON files the order wouldnt matter because we're selecting them explicitly by name but that doesnt seem to be true. I personally havent run into this so maybe ordering is mostly consistent for taps 🤷 .

I recommend we swap to explicitly defining the column names in the copy into

Yeah I like that idea. So adding (i.e. (col3, col2, col1)) to the statement like this:

copy into {full_table_name} (col3, col2, col1) from (select col3, col2, col1 from @~/target-snowflake/xyz) file_format = ...

Then if the table is in order col1, col2, col3 I think this would still work even though columns are out of order in the select since we're explicit.

Another example - https://docs.snowflake.com/en/user-guide/data-load-transform#load-a-subset-of-table-data

@pnadolny13 pnadolny13 self-assigned this Sep 12, 2023
pnadolny13 added a commit that referenced this issue Sep 12, 2023
Closes #116

I created a new test that creates a table then sends singer records that
are in a different order. The current implementation causes the test to
fail because
https://github.com/MeltanoLabs/target-snowflake/actions/runs/6160685807/job/16718219357
its trying to put a string in a boolean column. After implementing
Derek's suggested fix to explicitly define column names in the copy into
statement, the tests pass.

cc @visch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants