Skip to content

yummly/s3-to-redshift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

s3-to-redshift

A utility to load files from S3 to Redshift.

Usage without docker

$ lein run -m s3-to-redshift.core config.edn

This will:

  • find data files in S3 as specified by the config file
  • check which ones have not yet been loaded
  • group them in batches and create corresponding manifest files
  • execute redshift's COPY commands
  • and mark the files as loaded.

You need to first create a table to track loaded files:

create table s3_loaded_file(table_name varchar, url varchar(2048), create_time timestamp default sysdate, primary key (table_name, url));

The table name can be different, you just have to specify it in the config file.

See the example config, hopefully self-explanatory.

Usage with docker

On a system that supports docker, you can simply run this without installing anything:

docker run -it -v /your/path/to/config.edn:/tmp/config.edn  -e AWS_ACCESS_KEY_ID=XXX -e AWS_SECRET_KEY=YYY yummly/s3-to-redshift /tmp/config.edn

When running on an EC2 instance, omit the AWS environment variables to use instance credentials.

License

Copyright © 2015 Yummly

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

Utility to load files from S3 to Redshift

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published