Special case: Widely-spread known objects #17

M-Gonzalo · 2018-03-21T16:56:48Z

There are some pieces of data that exists in identical form on countless devices around the world. Some examples are license files like GPL and re-distributable libraries such as zlib1, qt* or 7z.dll.

For some of these, having an off-line dictionary bundled with the Fairytale distribution in a compressed form could help a lot. A very simple parser could recognize them, and then encode them just as a minuscule reference, leading to denser archives, faster to produce.

This method has an obvious drawback, which is the size of said dictionary. Even so, modern archivers tend to occupy dozens of MB on disk, even more than a hundred, because of their choice of graphical libraries.

This is very subject to discussion. Might be worth giving it a try.

If you happen to know a file that fit these characteristics, please mention it in the comments.

DedupOperator · 2018-04-04T20:32:19Z

Most cloud based and enterprise based backup solutions are using such method of multi-user deduplication database.
Every file on the host is hashed and the server is looking for the same hash.
It is a nice idea but I think it won't be a very good practice to build such a pool.
Deduplication database that will fit all kind of users will be huge.

Security and Practically wise will be for each host to build it's own dedup database in time, where there will be some duplicated data.

The Unique data will be compressed and possibly encrypted.
The database could be also compressed with today fast CPUs and SSDs.

Deduplication and Compression is already implemented in a very well tested way by the community members

M-Gonzalo added enhancement New feature or request good first issue Good for newcomers question Further information is requested labels Mar 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special case: Widely-spread known objects #17

Special case: Widely-spread known objects #17

M-Gonzalo commented Mar 21, 2018 •

edited

Loading

DedupOperator commented Apr 4, 2018

Special case: Widely-spread known objects #17

Special case: Widely-spread known objects #17

Comments

M-Gonzalo commented Mar 21, 2018 • edited Loading

DedupOperator commented Apr 4, 2018

M-Gonzalo commented Mar 21, 2018 •

edited

Loading