-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement for optical discs: Archive size limit, and erasure coding for error correction #123
Comments
Hi there, thanks for your comments! I'd appreciate a bit more input on your use cases as I don't quite understand how you'd like this to work, especially the bit about limiting/splitting output. The typical DwarFS use case (i.e. mounting an image to access its contents) would require access to all of the image data (this isn't strictly true; it'd require full access to the metadata, and access to individual blocks only as you access file contents). So how would you expect this to work if the data is spread across multiple discs that are not simultaneously accessible? As for the erasure coding, my recommendation is usually to use parchive to create redundancy for error recovery. The advantages are that 1) you don't bloat the DwarFS image, 2) you can create as much (as little) redundancy as you want and even change your mind later, 3) you can store the redundancy elsewhere and only use it if necessary. |
As I read the original issue description, I think the requirements are as follows. About the size limit, given that the target BD has a known maximum capacity, at least have a flag that instructs With regard the bit-rot protection, I think there are major advantages having that built into DwarFS, as for example:
That being said, for the second topic of bit-rot protection, perhaps a layered approach would work better, as for example:
Such a solution, although somewhat more complex, could be used not only with DwarFS, but with any other (read-only) file-system out there. |
I can't say I like either approach, to be honest. In the second approach it's impossible to know where to cut off the files as no compression has been performed yet. Doing some sort of "binary search" would be incredibly inefficient. If all individual parts were simultaneously accessible, this whole feature would be relatively easy to implement, but I take it that's not the case.
Yeah, something similar to this second approach crossed my mind (and I actually did a quick search for "parchive fuse" earlier just do find a comparison between DwarFS and parchive...). It doesn't even have to be a single file; as I said earlier, I think it's actually an advantage to keep the DwarFS image and redundancy data separate. In any case, any pointers to useful existing libraries/projects that could be used for implementing something like this would be very much appreciated. |
My use case is unusual. I have a 2000 disc media changer/jukebox, with 2 BD drives, that I am trying to turn into a single unified warm/cold storage space. My hacked together/bad solution is to create one big archive with mkdrawfs, split it into chunks, burn those onto discs, and mount them with fuseconcat (https://github.com/patatetom/fuseconcat). I'd modify fuseconcat to autoload chunk discs containing the data that dwarfs is requesting. The ideal solution is for dwarfs to split it's own archive output into user specified chunks, basically mimicking RAR's split-file archive output. I could add my autochanging code into drawrfs directly. From dwarfs's perspective, all of the data/parts are accessible, there's just a delay to autoload the next chunk disc. Using par2 externally is fine. There's a performance advantage to having parity on a separate disc, as long as the collection is kept together. On a side note, an old program called eXdupe had the ability to create linked backups, where a 'diff' backup could reference hashed/dedupe blocks in a 'full' backup. It's a tighter solution than repositories because you only need 2 files; the master/full backup and any single linked/diff backup. It was great for backing up lots of virtual machines that share the same OS and app data. I could delete any linked VM backup without harming the others. No need for repository compacting/rebuilding. It would be nice to have that ability in dwarfs. Archived link for exdupe: |
Quick followup... I found UnChunkfs (https://github.com/KOLANICH-mirrors/unchunkfs) which presents a chunked representation of a large file, without needing to split the file and use more disk space. I can burn discs from those virtual chunks. It also includes the inverse, which present a virtual concatenated file. Dwarfs is able to read the virtual concatenated file, so I'm adding my changer code to unchunkfs. I still think split-file output should be added to Dwarfs for other use cases, like e-mailing an archive, posting to file sharing sites, and other size-limited transfer methods. And an eXdupe style 'diff' capability would be awesome. |
As for the incremental backup functionality, that's already on my list of things to add. With regards to your use case... I think it's rather "special" and I wouldn't want to spend much time adding support for this particular use case. However, I think one thing that might be worth supporting is "decomposed images" (for lack of a better term). That is, instead of a single monolithic image, all blocks that the image is composed of would be stored as separate files (there's already a Python script to split an image into its blocks, mainly intended for quickly evaluating different compression algorithms). You'd probably want to keep the metadata block on a HDD/SSD and the data blocks on the BDs. Each block could be separately protected with error correction redundancy. An index of all blocks could be passed to the |
I was recently thinking about the topic of adding error correction capabilities again, and I'm still not sure building this into DwarFS is the best approach. I (fortunately) haven't been dealing with too much bit rot in the past, but I recall that whenever I had a faulty medium, the OS and/or hardware would take ages trying to read from it. I'm not entirely sure this is (still) the case, but even if not: I think by the time your medium is dying, you'd probably want to try and replace it with a new one built from the recovered data. I find the topic of error correction quite fascinating and wouldn't object to adding support to DwarFS, but I don't currently see a good or even useful strategy. But then again, I'm by no means an expert on the topic. |
I'm using drawfs on 25GB BD-RE discs, which have size and reliability issues, hence these requests:
Awesome program by the way! It greatly increases the read speed of optical media.
The text was updated successfully, but these errors were encountered: