Before the Christmas break I started writing a encoding/compression library based on a technique called Huffman coding. I’ve since completed and tested the encoder and decoder, today we’ll discuss the results of several compression tests in comparison to the RLE encoding technique. Here are links back to the posts for Huffman Coding and Run-Length Encoding.
So I’ve collected a few different types of data files with different characteristics. First up is a MS-DOS executable, these binary data files typically don’t contain runs of data and will tend to contain most if not all symbols. Next is a ASCII text file, which will typically only use alphabetic, numeric and punctuation characters. Lastly a small number of graphics files, in raw bitmap form, these particular images are fairly basic (only a handful of colours).
Here’s a table showing the results.
You’ll note that the Huffman encoder achieves better results in every example! The worst result for it being the MS-DOS executable, which is of course expected. I was pleasantly surprised that the small graphic files compressed quite well, this is likely because of how few symbols are in these files. If there had been a large variety of symbols the dictionary and the encoded data would be much larger.
Run-length encoding was obviously much worse, and in the case of ASCII text and executable data it actually increased the size of the data! Knowing how it works this is hardly surprising. I did expect RLE to perform better than Huffman coding for the graphic data, but I suspect the low colour count in these files is an influencing factor.
So I thought I’d create a graphic data file that had a higher colour count that would favour RLE to see if it can do better under specific conditions. The graphic data I created is a sprite 32×32 pixels with horizontal coloured stripes, each a unique colour. The size of the raw file came out at 1026 bytes.
The Huffman encoder produced a file of 766 bytes and the RLE produced one of 68 bytes! Whilst the Huffman encoder still managed to compress the data, it couldn’t match the best case scenario for RLE.
There are good reasons to use both compression techniques. Run-length encoding doesn’t require much CPU power to encode or decode, so it can be done on very weak machines (even old 8-bit machines), but it only compresses data with many runs in it well. This turns out to be useful for graphics and level data. Huffman coding will pretty much always reduce the size of the data encoded, sometimes significantly, but it is much more complex. It requires more RAM and CPU power to achieve this result, so it can’t be used on every machine, although most modern processors would have no trouble at all.
I’ve made the code for the Huffman encoder/decoder available in pascal for download. be aware the encoder still has some debug code in it that will write to the screen.
Recent Comments