Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize load to snowflake process #1

Merged

Conversation

pslavov
Copy link

@pslavov pslavov commented Aug 9, 2023

What Where/Who
Reviewers @pslavov @nezd @ivanovyordan

Background and why

While reading replication data from postgres just create s3 files and
do not load them in snowflake. When the replication is finished load all
files for each table at once - this way will limit import calls to
snowflake and import bigger batches - which is always better in
snowflake.
Also files will not be deleted when they are loaded, but we will keep
them in s3 for a week - cleaned later using lifecycle

Also update the S3 prefix to include hour and minutes - to avoid having
data imported multiple times when we have same pid in the same day.

While reading replication data from postgres just create s3 files and
do not load them in snowflake. When the replication is finished load all
files for each table at once  - this way will limit import calls to
snowflake abd import bigger batches - which is alwais better in
snowflake.
Also files will not be deleted when they are loaded, but we will keep
them in s3 for a week - cleaned later using lifecycle

Also update the S3 prefix to include hour and minutes - to avoid having
data imported mutiuple times when we have same pid in the same day.
@pslavov pslavov force-pushed the snowflake-imports-in-only-after-all-changes-are-read branch from 01b0a0c to 32f32a4 Compare August 10, 2023 06:07
@pslavov pslavov merged commit ad3d068 into master Aug 10, 2023
1 of 5 checks passed
@ivanovyordan ivanovyordan deleted the snowflake-imports-in-only-after-all-changes-are-read branch August 23, 2023 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants