S,C-dense coding in Python 3
Put the text file (by default named text.txt
) in the working directory, run the script. It will create a vocabulary in vocab.txt
, as well as encode
and decode
directories.
Vocabulary consists of source file's MD5 hashsum and text elements (words and punctuation) sorted by descending rate of occurrence. Default supported punctuation includes the following symbols: { .,!?:;
}. It can be modified via PATTERN
string defined at the top of the script.
Encode folder will contain 255 variants of dense codes (for each S value in [1, 255]
).
Decode folder will contain 255 copies of original text if worked correctly, each decoded from corresponding encode file.
Script is weakly tested, consider it a sample implementation.
- (S,C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases (by Brisaboa, Fariña et al.; link).
- On the Usefulness of Fibonacci Compression Codes (by Klein, Ben-Nissan; link).
- http://vios.dc.fi.udc.es/codes/semistatic.html (archive.org)