-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse file #30
Comments
I'm studying the viability of making the log sparse, and here's what I learned: From the
Bold is my own emphasis. I think this means that not all deleted records would become a "sparse file hole", the record has to be at least 4096 bytes big. Or, two or more consecutive records would have to be deleted. But we have a problem: because each record is cc @arj03 Other links: |
Oh right. There is also the block boundary in AAOC you can't get around. It would be interesting to write a small script to analyze your delete tests and calculate how many of the holes when combined are larger than 4k. |
Yeah, I'm currently doing something like that. I took my log in production and I'm deleting 80% of the records. Then I'm going to make it a sparse file and see whether If it's not great, then I'll try to merge holes and see what happens. |
Yeah, Will work on an experiment to merge holes. |
Alright, I finished a proof of concept for "hole merging", and here are the results. Deleted 80% of the records, merged all the consecutive holes together, and the I wish I could know/debug the sparse FS and discover the "min sparse hole size"... |
hmm yeah, I guess if it is votes, that you need quite a few of them in a row to make a sparse hole. |
But I also opened up the log in a hex editor and I could see a lot of huge holes. 80% deleted records at random spots means that the remaining records are actually very well spaced apart. |
I found a C program and I'm debugging how many sparse FS holes there are and where they start/end. https://codeberg.org/da/sparseseek |
I think I know what's going on. Here are how the holes look like (hex address
Notice that a hole starts and ends at multiples of |
I then did another experiment where all the first 40% records at the beginning of the log are deleted, i.e. all of those are consecutive, should make one huge hole. The The holes look like this:
Which means that there must be some data between |
I considered that maybe we could change "End of Block" marker to be I guess I'm ready to conclude that sparse files would help us only in specific cases (where deleted records are heavily focused on a specific region of the log), but in the average case the sparse file approach yields us just 5%~25% space freed. I was hoping an order of magnitude space freed, so something like 90%. |
Yeah, sadly sometimes things don't work out exactly as planned. At least what you did was a good exploration of this idea, so that in the future we can safely say that this was tried and know how it works. |
Consider how we can use sparse file support in the OS to better support delete.
The text was updated successfully, but these errors were encountered: