COPY command relies on column order in DataFrame #340

kyrozetera · 2017-03-31T21:15:17Z

The write operation currently generates a COPY command like the following:

COPY "PUBLIC"."some_table" FROM 's3://some-bucket/tmp/manifest.json' CREDENTIALS 'aws_access_key_id=__;aws_secret_access_key=__' FORMAT AS CSV NULL AS '@NULL@' manifest

This relies on the DataFrame to have the columns in the same order as the table if it already exists. However, the COPY command supports specifying column lists or JSONPath expressions to map columns (documentation). It would be nice to at least support the column list, potentially as an option on the write operation like:

df.write
  .format("com.databricks.spark.redshift")
  .option("url", "jdbc:redshift://redshifthost:5439/database?user=username&password=pass")
  .option("dbtable", "my_table_copy")
  .option("tempdir", "s3n://path/for/temp/data")
  .option("include_column_list", "true")
  .mode("error")
  .save()

Looks like this should be fairly straightforward to add here.

The text was updated successfully, but these errors were encountered:

farshidz · 2018-05-10T02:50:34Z

So this hasn't been merged yet. Is there any other solution (short of making sure column order in the DataFrame matches the table)? I'd like to use CSV GZIP instead of AVRO, but can't append to existing tables because the CSV has a different order than the table.

kyrozetera · 2018-07-25T15:14:18Z

Nope, I've just been making sure to run a final .select() to specify the column order on any of my DataFrame operations before writing to Redshift. Unfortunately, the connector is no longer being maintained, so it doesn't look like this will be merged... 😞

kyrozetera linked a pull request Apr 28, 2017 that will close this issue

Include Column List option #343

Open

eeshugerman mentioned this issue Dec 9, 2019

Add 'include_column_list' parameter spark-redshift-community/spark-redshift#58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COPY command relies on column order in DataFrame #340

COPY command relies on column order in DataFrame #340

kyrozetera commented Mar 31, 2017

farshidz commented May 10, 2018

kyrozetera commented Jul 25, 2018

COPY command relies on column order in DataFrame #340

COPY command relies on column order in DataFrame #340

Comments

kyrozetera commented Mar 31, 2017

farshidz commented May 10, 2018

kyrozetera commented Jul 25, 2018