Posts Tagged ‘Pascal

23
Feb
21

Bob’s Fury Update: Bug Squished!

For quite some time I’ve had a very nasty bug in my game, where basically it would lock up whenever a sound was played through any sound device on my microbyte 386sx machine. What made this bug tricky to fix is that it worked perfectly fine on other hardware I had available and on the emulation I was using for testing (which was Dosbox).

I had already discounted any results from Dosbox, as it is widely known that the emulation is not hardware accurate. So I tested on the hardware I had available. Unfortunately I don’t have a lot of old 286 and 386 machines laying around, so I tested on the Microbyte 386sx and my Pentium MMX system. The big clue I got from that is that the sound code worked fine on the Pentium but not on the 386.

This is where I was stalled for quite a while, I tested much of the code and couldn’t figure out the fault. What I needed was another 286 or 386 machine to be sure it wasn’t just a quirk of the Microbyte, but this wasn’t something I could do easily. Recently I found an alternative solution that helped me solve the problem.

I have been watching some of Jim Leonard’s videos, as he makes some really good content about PC’s and is generally extremely knowledgeable. He had some of the same sorts of issues with developing DOS software using Dosbox for emulation, which is why he tends to use real hardware where possible. However he did recommend a PC emulator I hadn’t heard of called 86box.

So I downloaded 86box and fired it up. I was able to set up a virtual machine fairly quickly and found I could change some key configuration options I would later need. Most importantly I could replicate the crash in the emulator and experiment to find the cause.

Knowing that some aspect of the hardware configuration was causing the crash, I changed the configuration of 86box until the game didn’t crash, the key element turned out to be enabling the NPU. With it enabled the sound code worked a treat even on an emulated 386 system.

The reason for the crash, whilst difficult to work out, is actually quite simple. I have an interrupt hooked on $1C for playing sounds of any type, during the interrupt it would figure out the length of the next sound in timer ticks. This would use some simple floating point arithmetic.

Because I had configured the pascal compiler to use 80287 instructions and included emulation for when a NPU isn’t present the code would crash during the $1C interrupt. This would happen because of how the NPU emulation works. If a 286+ CPU encounters a floating point instruction and there is no NPU present it fires an interrupt, which redirects to the software emulation.

The problem plays out like this: In the main code there were some floating point calculations, these would use the emulation which could then be interrupted by more code also needing the NPU. This would mess up the internal state of the emulation causing a crash. This can’t happen with a real NPU because it can’t be interrupted mid instruction.

The solution was fairly simple, I turned off 80287 instructions and emulation and made minor adjustments to code so as to not require it. This got it working on real machines without a math-coprocessor and in 86box so I was quite pleased. I’m not entirely sure why I had used the 80287 code generation in the first place, but I can see why it persisted in my code for so long. Many of the machines I had tested on over the years were 486+ class computers, these all have the co-processor built in, so the problem wouldn’t appear. Dosbox didn’t expose the bug because there is no configuration option for turning off NPU support.

There is more work yet to do, such as implementing a lookup table to save on some of the calculations, and fixing other bugs. I’m now automatically packaging a download based on my most recent subversion revision which you can find here. I’m thinking about eventually hosting the binaries and code on somewhere like github.

27
Feb
20

Bob’s Fury Update: Bug Hunting

I’ve been doing some coding on Bob’s fury lately, basically making adjustments and making code faster in the hope that I could make it work well on a 286 class machine. I have been using dosbox to develop and test, but this doesn’t fully test the compatibility or stability of software as dosbox has its own quirks and does not behave exactly as real hardware does. This is where my old Microbyte 386sx computer steps in, here’s a photo from when I first bought it.

It has been very useful for performance testings as it has a turbo feature that is software controlled. You can not only toggle the CPU speed with a key stroke, but you can set the slower speed in the BIOS configuration. This has allowed me to test with the machine running at its stock 20Mhz as well as a slower speed, I’ve used 10Mhz for my testing as a rough equivalent of many 286 machines. The testing I was able to do has shown that my game should be playable on a 286 depending on the video mode. CGA and VGA both have acceptable speed for game play, whilst EGA is marginal.

The testing did however uncover a rather annoying and difficult to squish bug. If I have the sound turned on (PC speaker is all this machine supports) the system will freeze when a game event that plays the explosion sound happens. Hunting bugs such as this are difficult as there is little feedback about the potential causes. It sounds like the code supporting the PC speaker should be at fault, so that’s where I began the search for the bug.

For the PC speaker I found a support library many years ago that allowed you to use the music macro language used by the play statement in GWbasic and QBasic. It hooks interrupt 1Ch (the system timer interrupt) to update the state of the PC speaker. Whilst I haven’t had any previous issues I wondered if there was a fault that could cause the crash. At first I wondered if using floating point instructions could have been the cause. The library has a small section using floating point to determine the number of ticks a note should last. On a system such as the 386sx without a FPU such instructions cause an interrupt so that emulation in software can take over. This interrupt within an interrupt was what I thought may be the cause.

So I constructed a test program hooking the same interrupt and performing a series of floating point calculations as a test, this didn’t yield the result I’d hoped as the test program worked fine. So I then wondered if the 386 was getting another type of problem that would cause an unwanted interrupt. So I copied sections of code from the library into the test program to run in a normal procedure that would show run-time error messages and debug information I could use. No run-time errors appeared and the results of the required calculations appeared to be correct.

I have a few ideas left to test, but I’m left with quite a puzzle regarding the cause. This does illustrate the need for testing on actual hardware, it’s usually better to test on many machines. Unfortunately I don’t have a large supply of 286 and 386 class machines, although I have a few I may be able to repair. I need to test on another machine because there could be something specific in the design of the Microbyte machine that isn’t compatible and is causing the issue.

29
Jan
16

Huffman Coding – Compression results

Before the Christmas break I started writing a encoding/compression library based on a technique called Huffman coding. I’ve since completed and tested the encoder and decoder, today we’ll discuss the results of several compression tests in comparison to the RLE encoding technique. Here are links back to the posts for Huffman Coding and Run-Length Encoding.

So I’ve collected a few different types of data files with different characteristics. First up is a MS-DOS executable, these binary data files typically don’t contain runs of data and will tend to contain most if not all symbols. Next is a ASCII text file, which will typically only use alphabetic, numeric and punctuation characters. Lastly a small number of graphics files, in raw bitmap form, these particular images are fairly basic (only a handful of colours).

Here’s a table showing the results.

Compression ratio table

You’ll note that the Huffman encoder achieves better results in every example! The worst result for it being the MS-DOS executable, which is of course expected. I was pleasantly surprised that the small graphic files compressed quite well, this is likely because of how few symbols are in these files. If there had been a large variety of symbols the dictionary and the encoded data would be much larger.

Run-length encoding was obviously much worse, and in the case of ASCII text and executable data it actually increased the size of the data! Knowing how it works this is hardly surprising. I did expect RLE to perform better than Huffman coding for the graphic data, but I suspect the low colour count in these files is an influencing factor.

So I thought I’d create a graphic data file that had a higher colour count that would favour RLE to see if it can do better under specific conditions. The graphic data I created is a sprite 32×32 pixels with horizontal coloured stripes, each a unique colour. The size of the raw file came out at 1026 bytes.

The Huffman encoder produced a file of 766 bytes and the RLE produced one of 68 bytes! Whilst the Huffman encoder still managed to compress the data, it couldn’t match the best case scenario for RLE.

There are good reasons to use both compression techniques. Run-length encoding doesn’t require much CPU power to encode or decode, so it can be done on very weak machines (even old 8-bit machines), but it only compresses data with many runs in it well. This turns out to be useful for graphics and level data. Huffman coding will pretty much always reduce the size of the data encoded, sometimes significantly, but it is much more complex. It requires more RAM and CPU power to achieve this result, so it can’t be used on every machine, although most modern processors would have no trouble at all.

I’ve made the code for the Huffman encoder/decoder available in pascal for download. be aware the encoder still has some debug code in it that will write to the screen.

30
Nov
15

Huffman Coding

Quite some time ago I did a short post about a compression technique called run-length encoding (or RLE), that was a commonly used compression method for graphic and level data. At the time I wrote and implemented it for the graphics and level data for my home-brew game Bobs Fury with quite the success, significantly reducing the disk space required for the base game.

I do however have some files left, which are largely just ASCII text, that don’t compress well or at all using that technique. I’d like to make them smaller, and of course harder to read as they contain story text, not that my game has a brilliant story, but you get the idea. Step in an encoding technique called Huffman coding.

Essentially the encoding algorithm encodes symbols as a variable length stream of bits. The most frequently used symbols are represented by shorter bit streams, whilst the least used have longer ones. Normally in a computer each symbol would be encoded as a fixed number of bits, such as say 8-bit or 16, so this results (hopefully) in shorter encodings for most symbols, and longer ones only for the rarely used ones.

The tricky part is creating the Huffman tree, which is basically a code-book representing how each symbol is encoded or decoded. Here is a quick tutorial on how they are created, which will also give you a feel for how the encoding works. It’s also commonly known as a dictionary, or code book.

A fixed tree can be used for everything, but would not do the best job for every set of data being compressed. So typically a tree is created and stored along with the encoded data to achieve the best compression possible. This of course does add some overhead, which could be a problem if the resulting encoding isn’t much shorter than the original.

Huffman coding typically works poorly when the symbols all appear in the text at roughly the same frequency. The worst case being everything with exactly the same frequency. Notably it won’t produce an encoding that is longer than the original data, although with the overhead of storing the tree, you could end up with a larger data file. In practise this rarely happens.

Other data such as English text stored in ASCII actually compresses quite well. As usually not all the 255 characters are used, most encodings will be shorter than 8 bits per symbol. Also because natural language uses some letters more than others the average encoding length for all the symbols will be shorter.

Huffman coding was actually invented quite some time ago (1951) by David Huffman, well before it came into common use. Check out the Wikipedia page for more information. It’s a part of many commonly used compression programs and formats such a Zip and Bzip. Older 8-bit machines typically weren’t powerful enough, so it wasn’t commonly used until more powerful machines with more memory became available.

It took me much longer than usual to write this post primarily because I began the process of writing an encoder and decoder, but because of the complexity, it’s taken up much more time than I expected. Currently I have just finished the encoder, but have yet to test it. I had hoped to get the code running first, but that will have to wait.

06
Apr
15

Creating a benchmark: Part 4

Last time I used the  in-line assembler to improve the speed of the sprite blitting functions with lots of success. Other functions such as line drawing however still suffered speed issues on systems with out a built-in FPU such as the 386sx.

The reason for the slow line drawing was simple, I had used a simple slope based algorithm that used floating point. On a system with a FPU this was quite fast, about as fast as anything else, but obviously it wasn’t going to cut the mustard when it came to the old processors I’m targeting. So I decided to give fixed point arithmetic a go.

Fixed point numbers are a variation on basic integers, using some of the bits to represent the fractional part. You can then use integer instructions to do some basic math that involves fractions. This is much faster for processors lacking hardware floating point support. Check out more details of fixed point here or at Wikipedia. It’s a very common technique for embedded systems, but was also used in games such as Doom for speed.

I however ran into a problem. Because the integer in turbo pascal is 16bits I didn’t have enough bits for addressing all the pixels and have enough precision to for the fractional part of the slope. For the integer part I needed to have a range of 0-319, requiring 9 bits unsigned. The fractional part was thus left with 7 bits, with the smallest representable fraction being 1/128th. The smallest conceivable slope in 320×200 is 1/320 which obviously is much smaller. I could have switched to using longints or storing the fractional part separately but this would have added extra overhead that would make line drawing slower.

I set the line drawing problem aside for the moment to look for a quick circle drawing algorithm. Something I hadn’t implemented yet. The SWAG archive came up with suitable options, but also had some algorithms for line drawing. In particular the Bresenham line drawing algorithm.

Not having coded my own low level line drawing routines before I hadn’t heard of it, but it uses all integer math in a clever way to produce the correct slope. I modified the algorithm I found to reduce the amount of calculations for the screen address when plotting points in the line. The resulting code was slightly faster than previous line drawing code, but not overwhelmingly so.

Returning to implementing circle drawing, the algorithms I found unfortunately used floating point math. They all use the basic formula for a circle, x^2 + y^2 = r^2, thus requiring the use of a square root function to calculate the co-ordinates. Unfortunately it is one of the more expensive floating point operations.

One algorithm I looked at used the square root function, but rounded it to an integer immediately. This got me thinking, if I could implement an approximation of square root using integer arithmetic I could draw circles quickly. So after a bit of research I wrote exactly that. Code follows.

function intSqrt(num: word):word;
var
   xo,xn:word;
begin
     {we're using Newtons method for approximating the sqrt}
     if (num=0) then
     begin
         intSqrt:=0;
         exit;
     end;
     xo := 0;
     xn := num;
     if xn=0 then xn:=1;
     while (abs(xo-xn) > 1) do
     begin
          xo := xn;
          xn := (xo + (num div xo)) shr 1;
     end;
     intSqrt := xn;
end;

As stated in the comment, I’ve used Newton’s method for calculating the root, mostly because it’s a simple method I know. I did a little testing, and it turns out this returns pretty much the same result as rounding the built-in sqrt function that uses floating point, only quicker. Once I finished the circle drawing code it drew around 36% more circles in the same time as the BGI code. That’s not bad, but I think there may be more room for optimisation here.

Next time I hope to have optimised the circle routine, and I’ll test the code on some hardware to see how all the different libraries compare to each other on different platforms.

27
Feb
15

Creating a Benchmark: part 3

Last time I started the process of comparing the BGI to a hand coded VGA library. I coded up a fairly lazy completely Pascal library. Today I’ve re-coded some parts of that code using x86 assembly with Pascals in-line assembler.

At work in the IDE

At work in the IDE

The first thing I wanted to tackle was the speed of the sprite blitting and filled boxes as they didn’t seem to live up to their potential. I decided to replace the Pascal move (copy memory) and fillchar (fill memory) functions as they are heavily used in the Pascal only version.

Luckily there are some neat instructions which make copying and filling memory faster even on old processors like the 8086 and 80286. These are MOVS and STOS, both string instructions which actually owe their existence to the Z80 and i8080 where they first appeared. Using them with the REP prefix makes them even better as it helps eliminate some looping code.

MOVS is for copying a block of memory, you load the ES:DI registers with the destination pointer and DS:SI with the source pointer and CX with the loop count. Then execute…

shr cx,1
@again:
rep movsw
jcxz @next
loop @again
@next:
jnc @done
movsb
@done:
...

This code will copy any count of bytes one word (16 bits) at a time, copying a single byte at the end if you specified an odd count. I’ve used JCXZ and LOOP to continue the data copying as some older processors have a bug where the REP MOVSW can end early if an interrupt occurs at the wrong time. I know this isn’t strictly necessary, but it’s a safety measure.

STOS works in much the same way, just it doesn’t source the data from a memory pointer, it uses the accumulator register instead.

With these new memory copy and fill routines done I tested the program to see if I had any improvement in performance. To my surprise there was none, the built-in functions for copying and filling memory must be about as good as what I wrote, but why is the blitting and box filling still slower than they should be?

It turned out that the loops in the filledBox and putImage functions were the culprit. The pascal code looked like this for putImages main loop…

for i:= 0 to sizey do
         copymem(bseg,4+(i*sizex),cardseg,((y+i)*320)+x, sizex);

It didn’t look problematic until I considered the instructions required for calculating the offset into the image data and screen buffer. Multiplication is an unfortunately slow operation, and with some nifty assembly code I rewrote both the putImage and filledBox procedures mostly in assembly, avoiding multiplications in the main part of the loop altogether.

It took me about 2-3 days to get through all the work re-writing two of the drawing functions in assembly, when it took about 1 to write the basic VGA graphics to begin with, but boy did it pay off. After re-writing most of putImage and filledBox in assembly I increased their performance by over 3 times for putImage and almost 2 times for filledBox. Both are also now significantly faster than the BGI implementation, being about twice as fast.

So the BGI is slow compared to raw x86 assembly after all, but it took significant effort to get that performance gain. For the myriad of one-man shareware programmers I still understand why they just went with the BGI, it was easy to use and good enough for what they were doing.

Making a VGA library with straight pascal was fairly easy to do, but had some disadvantages over BGI and wasn’t really quicker. I had to go to assembly before there was any significant performance gain. Coding assembly is daunting to many programmers, and for me is much more time consuming than writing in a higher level language. It will be quite some time before I really finish re-coding the library in assembly.

Next time I’ll have to tackle the line drawing functions, which are using some floating point numbers to accurately draw the lines. I’m planning on converting them to using fixed point numbers to improve speed on machines without an FPU, like my old 386sx. I’m also hoping assembly will help speed things up there to.

29
Jan
15

Creating my own Benchmark program

Looking at benchmarking programs last week got me thinking about creating my own testing software. I’m a Pascal programmer as far as DOS goes and I have read in many programming forums about how slow the Borland graphics interface is. I decided to test this theory out and find out just how slow or fast they are, and what effect the BGI driver and graphics mode has on the performance.

The Borland Graphics Interface is a library that Borland supplied with the Turbo C and Turbo Pascal products. It was used by many because it simplified drawing graphics and meant you didn’t need to write the code for drawing to the screen. This could save a lot of time for the individual programmer, and many shareware programmers used it in their simple games.

As I’ve said before, many have claimed that it is quite slow. So I have written a simple program called BGIbench to test the speed of any BGI driver. I test some of the more important graphics functions that will work on all the drivers, some like page flipping only work with specific drivers.

The drivers I’ve tested here are the standard Borland CGA and EGAVGA ones that came with Turbo Pascal 6 as well as VGA256 (a mode 13h driver), VESA (uses VESA compatible modes) and SVGA256M (another VESA driver) that I found when I started writing my platform game all that time ago.

BGIbench results for Dosbox @3000 cycles

BGIbench results for Dosbox @3000 cycles

Here are the results for testing performed under Dosbox. The result that matter the most for games is the sprites, notably the vga256 driver is the best in this category. All types of lines and circles are about the same between the drivers, although I have noted that drawing circles is pretty slow, slower than even Qbasic if I remember correctly.

Of interest is the filled boxes, which in theory should be about as fast as the sprites, but the EGA,VESA and SVGA256M drivers seem more capable at drawing filled boxes than blitting bitmaps. This doesn’t make sense, there must be something contributing to slow-down with bitmaps and these drivers. They are even slower than the pattern filled boxes!

BGIbench on a real machine pentium MMX 200Mhz

BGIbench on a real machine pentium MMX 200Mhz

On actual hardware things are much different. Notably the VESA driver is nearly as good as the VGA256 one at drawing sprites in the same resolution, and is in fact significantly faster at drawing filled boxes, which makes me think there is still some lost potential in that driver.

The SVGA256M driver has poor performance unfortunately where it counts, blitting sprites. In fact it’s almost as slow as the EGA driver, which is the worst performer. Again there must be significant lost potential as drawing filled boxes is significantly faster, but it suffers worse performance in most line drawing.

Circles and rendering text is again something that performs poorly across the board, so these types of primitives should be avoided if possible when writing BGI based software.

In summary the CGA and EGA drivers offer modes on older hardware that isn’t offered by the others, so they remain useful even though there are faster drivers. The VGA256 driver is the best for 320x200x256 sprite based graphics, but if you wanted to do vector/line drawing the VESA driver performs better in those areas. The VESA driver also offers higher resolution modes, although at a performance hit as it needs to switch memory banks for drawing. The SVGA256M driver is probably the least useful as others out-perform it in all areas and resolutions.

The question remains how fast is the BGI compared to writing a graphics driver yourself? This will be answered another day.

UPDATE: A Download of the results and the program can be found here.

16
Dec
14

Improving Joystick support for Bob’s Fury

Some of the hardware support within Bob’s Fury has been far from ideal, the joystick/game pad being one such device. I had only added support for a simple 2 button joystick, as that’s all I had when I first wrote the code a long time ago. It proved to be inadequate as there just simply wasn’t enough buttons to support all the functions in game, and you couldn’t choose what the buttons do.

Old Joystick Configuration

Old Joystick Configuration

So with the aid of a real machine and a Gravis game pad I worked out how the basic 4 button devices work. It turns out they aren’t much different. I had originally coded my interface to expect two joysticks, each with 2 axis and 2 buttons. It turns out that the buttons on the second joystick correspond to the 3rd and 4th buttons on the Gravis game pad, which means it was fairly simply to allow such a device to support 4 buttons with a minimal effort. I re-wrote the joystick hardware code to be a single joystick with 4 buttons and axis which should support most devices.

New Configuration Screen

New Configuration Screen

Once I got the joystick code re-worked I didn’t want to give fixed functions to the buttons, so I had to recode the configuration interface to allow changing what the buttons do, which funnily enough took longer to get right than the hardware side of things. After some testing in Dosbox I’m pretty happy with the result. I still need to test on some real hardware to make sure everything works, and I need to test a two button joystick to ensure that still works as well. I’ll update the download once I’ve tested it on real hardware.

27
Oct
14

A Guide to Memory/Object Pooling

I was recently listening to a vlog by a games developer who wanted a new programming language specifically for games, and one of his big gripes with C++ was the memory allocation process. He had a number of arguments, none of which resonated with me, but it did spark interest for me in the memory allocation/de-allocation problem. I’ll briefly discuss the problem from the perspective of a language like C (in this case Pascal) and one that uses garbage collection (in this case Java).

Most languages dynamically allocate and free memory in very much the same way as even very old languages such as Pascal did. You have to explicitly ask the API to allocate memory for you, and then explicitly free it when you are finished with it. In an OO language like C++ this results in the need for destructors to free the memory the object uses when it is destroyed.

Either way memory is allocated and freed, and if it is done on a regular basis this can lead to something called heap fragmentation. Heap fragmentation is not quite the same as disk fragmentation in that an allocated block will not be split up. Pascal for instance allocates memory one block after another, simply creating the new one at the end of the old one. However when a block of memory is freed it leaves a hole, these holes are effectively heap fragmentation.

Pascal deals with these holes by maintaining a free list, which is a linked list containing all the free blocks available. When an application allocates more memory the list is traversed to find the first available memory big enough for the new allocation, hopefully filling the hole but possibly leaving a small amount still available. Clearly the holes will likely get smaller over time until they are unusable, and the more holes there are the longer any memory allocation will take.

Other allocation schemes are available in other languages, but they have similar issues, take a look here to see what I mean. Whatever scheme you use there is significant cost the more you allocate and free memory.

Garbage collected languages such as Java have problems of their own. Whilst it is claimed that allocating memory in Java is practically free in terms of complexity, freeing memory is known to have performance issues as the garbage collector (GC for short) kicks in to clean up. The mechanics of the memory allocation mechanism are largely hidden in a language like Java, so that shouldn’t concern you much, but the performance impact a garbage collector can have should concern you.

So why is the GC a potential performance hazard? In order for the GC to know which memory to free it must work out which memory is still being used. One way of doing that is maintaining a reference count for each pointer/reference in use. Obviously any reference with a count of zero can be freed, but this leads to another problem, determining which collections of objects can be freed.

Imagine a data structure like a tree in which every node holds a reference not only to its children but its parent as well. If you were to use the tree, holding only a reference to the root node, then later released that reference, reference counting would not enable freeing the memory the tree uses. Even though no external objects can reach it. A GC therefore also needs to have an algorithm that will check the reachability of memory.

That’s a rather simplistic view of garbage collection, but it’s enough to see that running the GC can be expensive.

So how can you deal with these issues? It seems it doesn’t matter which kind of system/language you use, memory management has real implications for performance. One answer is to use a technique commonly called object or memory pooling.

A simple way to think of a memory pool is that it’s a lot like a cupboard. When you need something it’s quite quick to go to the cupboard and get something that’s available there, and when you’re done with it you return it. If the cupboard turned out to be empty, you’d have to go down to the shops in order to buy more of whatever you need. An advantage being you can stock your cupboard with resources you plan on using later.

Memory and object pools are effectively exactly the same thing. You can preallocate memory, use it, return it to the pool, then use it again. This reduces the amount of stress on the memory allocation system, leading to fewer GCs and less heap fragmentation. You’d be amazed how much difference something like this can make on a system, especially something like Android.

As an example, I implemented a path finding algorithm in Java for a maze game I am writing. I could have used a dynamically sized data stack that grows and shrinks as the algorithm functions. That would however trigger the GC much more frequently as is uses and releases memory on the data stack. It is much better to preallocate the maximum size of the stack once, and reuse the same memory every time the algorithm is used. This only allows one instance of the algorithm to run at once, but it can be adjusted to allow multi-threading by creating a shared pool that each instance can use.

So what kinds of objects or memory should you use pooling for? Ideally anything that is short-lived should have some kind of pooling scheme in place as they put the most stress on the memory allocation system. This would be objects like particles and bullets in a game engine. Long-lived and infrequently used objects that pose less of a problem to the system don’t really need to be pooled as they cause much less stress.

Pooling can also be used for other resources that are expensive to create and destroy such as database or network connections. This can help alleviate stress on other systems such as the network stack, database servers, and file systems. So they are really an import tool in any programmers toolbox to improve performance.

22
Jul
14

Run-Length Encoding

I hadn’t quite gotten over the horrible cold this weekend, and have had to spend most of my time looking after children. So today’s post will be a short one. I’m looking at a very basic data compression technique frequently used in many older micro-computers of the 80’s to store level information and graphics for games.

The technique is called Run-length encoding (often called RLE), it takes advantage of repetition in data to reduce the storage requirement. It used to be a common compression technique for encoding graphics, but can be used effectively to encode level data or anything that has repeating data in it. The GIF image format is a well known example that uses the technique.

The encoding algorithm is very simple, you simply encode runs of data (repeated identical data) as the number of repetitions and a copy of the data. So if you had data that looked like this aaaaabbbbcaaa it could be encoded as a5b4c1a3. The trouble with this scheme is the worst case scenario (different data every byte) can actually increase the file size significantly, in some cases resulting in a larger file than uncompressed. You can avoid this scenario by encoding the start of a run as repeated characters with single characters representing themselves. This results in the previous data being encoded as aa5bb4caa3 which goes to show that even that technique has its downside.

I wrote up a quick implementation in pascal as an example and used it to compress some data files from my DOS platform game, Bob’s Fury.  The first two data files I tried were the graphics data files which you can see compressed quite well, this is because there are many long runs of the same colour pixels. The third data file takes it to the extreme, it is a level data file that is pretty much one big long run of empty spaces, it compresses to an impressively small size. The fourth data file is normal level data (albeit a much larger level), it is much like the graphics data in that there are many runs, but it has some data at the end which does not compress well resulting in a worse ratio. The last file in the table is actually an ASCII  map file detailing the contents of the executable for the game, it is useful in determining where errors are in the code, but this clearly shows that ASCII data is unsuitable for this compression technique, it resulted in a much larger file!

What follows is the pascal code I wrote for this quick test. It needs the buffer unit I wrote for buffered reading and writing files. You can also download the source file from here.

Continue reading ‘Run-Length Encoding’




Enter your email address to follow this blog and receive notifications of new posts by email.


Mister G Kids

A daily comic about real stuff little kids say in school. By Matt Gajdoš

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm's Vintage Computing, Tech & Scale RC Blog

Random mutterings on retro computing, old technology, some new, plus radio controlled scale modelling.

ancientelectronics

retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way