I hadn’t quite gotten over the horrible cold this weekend, and have had to spend most of my time looking after children. So today’s post will be a short one. I’m looking at a very basic data compression technique frequently used in many older micro-computers of the 80′s to store level information and graphics for games.
The technique is called Run-length encoding (often called RLE), it takes advantage of repetition in data to reduce the storage requirement. It used to be a common compression technique for encoding graphics, but can be used effectively to encode level data or anything that has repeating data in it. The GIF image format is a well known example that uses the technique.
The encoding algorithm is very simple, you simply encode runs of data (repeated identical data) as the number of repetitions and a copy of the data. So if you had data that looked like this aaaaabbbbcaaa it could be encoded as a5b4c1a3. The trouble with this scheme is the worst case scenario (different data every byte) can actually increase the file size significantly, in some cases resulting in a larger file than uncompressed. You can avoid this scenario by encoding the start of a run as repeated characters with single characters representing themselves. This results in the previous data being encoded as aa5bb4caa3 which goes to show that even that technique has its downside.
I wrote up a quick implementation in pascal as an example and used it to compress some data files from my DOS platform game, Bob’s Fury. The first two data files I tried were the graphics data files which you can see compressed quite well, this is because there are many long runs of the same colour pixels. The third data file takes it to the extreme, it is a level data file that is pretty much one big long run of empty spaces, it compresses to an impressively small size. The fourth data file is normal level data (albeit a much larger level), it is much like the graphics data in that there are many runs, but it has some data at the end which does not compress well resulting in a worse ratio. The last file in the table is actually an ASCII map file detailing the contents of the executable for the game, it is useful in determining where errors are in the code, but this clearly shows that ASCII data is unsuitable for this compression technique, it resulted in a much larger file!