Skip to content

codyleyhan/compressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Compressor

This package allows for the data compression of ASCII Art.

Implementation

The data compression is utilizing the lossless Huffman coding algorithm to generate prefixes/codes that are shorter than the standard 8 bits that would be required for each character in standard ASCII. These codes are based on the the frequency that the character appears in the art, with the most frequent character having the shortest code. Using this algorithm we can achieve a very high lossless compression. For the test data, the algorithm was able to compress from 5495 bytes to 1327 bytes which is a compression ratio of ~4.

In order to serialize the data, I first add some metadata as a header to the beginning of the data. This metadata includes the number of character to code mappings as 1 byte, the number of total characters that are encoded as 2 bytes, and finally the serialized implementation of the prefix tree that is used in huffman encoding as a variable number of bytes. I do not encode the frequencies as there is no added information that will aid in decompression by including them. Then the actual data is encoded after the meta data.

Data Serialization Format

Limitations

Due to the way that I implemented my serialization of metadata, the theoretical limit on the number of characters that can be encoded and decoded is 2^16, but this is far more characters than is asked of as 100 x 100 is only 10k characters.

The other limitation is that the algorithm does not preform well when the encoded data is relativly small such as one word or a very short sentence. This is due to the overhead that is needed in serializing the metadata.

Future

If I had more time I would have done much better unit testing as well as cleaning up some of the code organization. I would also have liked to make the code much more robust. Additionally I would like to add the ability to tell when the algorithm will produce poor results and not compress the data if this occurs.

How to Run

Ensure that the following dependencies are installed:

  • Go 1.9
  • dep

Ensure that this repo is in the correct directory of your gopath.

I intentionally left the package very open ended in order for it to be used in multiple use cases.

To run the simple use case ensure you are in the cmd folder:

# /github.com/codyleyhan/compressor/
dep ensure

cd cmd

go run compressor.go

Alternativly the package can be imported and used in another program:

import "github.com/codyleyhan/compressor"

// the following can now be used

func main() {
    compressor.Encode(data, file)

    compressor.Decode(compressedFile, buffer)
}

About

Implementation of Huffman encoding in go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages