Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make compressed log rotation atomic #155

Open
wants to merge 1 commit into
base: v2.0
Choose a base branch
from

Conversation

chancez
Copy link

@chancez chancez commented Mar 14, 2022

If another process is watching for *.gz files then it's possible to
begin reading the archive before it has been completely created,
resulting in corruption if the other process is copying the archive to
another location (for example: archival to s3).

To resolve this, we can use a different suffix when writing the file so
that other programs do not read it while it's being created. Once the
archive has been completely created, we atomically rename it to the
desired file name with the *.gz extension, ensuring external programs
only ever see the finished archive.

Signed-off-by: Chance Zibolski [email protected]

@pchaigno
Copy link

pchaigno commented Apr 4, 2022

@natefinch 👋 Did you get a chance to look into this?

@natefinch
Copy link
Owner

I mean, I guess it's ok? but then what if code is looking for .tmp files?

What problem is this solving? What program do you have that is looking for *.gz files and what does it do?

@chancez
Copy link
Author

chancez commented Apr 8, 2022

@natefinch I considered making the tmp suffix configurable. Basically, in cilium we are using lumberjack for log rotation, and have a daemon watching for *.gz files and is uploading them to s3 for archival. What's happening without this is that the daemon is seeing the files get created, and it's uploading the files before they've been completely written to, so we've got corrupted .gz files in s3.

Also, I need to push up another fix to this PR, I forgot to Flush the writer before calling Sync on the underlying file. (Done)

…e to prevent reading incomplete files

If another process is watching for `*.gz` files then it's possible to
begin reading the archive before it has been completely created,
resulting in corruption if the other process is copying the archive to
another location (for example: archival to s3).

To resolve this, we can use a different suffix when writing the file so
that other programs do not read it while it's being created. Once the
archive has been completely created, we atomically rename it to the
desired file name with the `*.gz` extension, ensuring external programs
only ever see the finished archive.

Signed-off-by: Chance Zibolski <[email protected]>
@chancez chancez force-pushed the atomic_compression_upstream branch from 4145608 to 201297d Compare April 8, 2022 15:35
@pchaigno
Copy link

pchaigno commented May 4, 2022

Hi @natefinch 👋 Does the above look good?

@pchaigno
Copy link

@natefinch Did you get a chance to look into this? Is there anything else we'd need to change?

@kaworu
Copy link

kaworu commented Aug 25, 2023

@natefinch could you take a quick look at this PR please? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants