Archive for the 'Programming' Category


Making a new GWBasic game.

When I was young I learned to code using GWBasic, a BASIC interpreter that came with MS-DOS 4.01. I’ve posted some code and talked about it in the past, but not really developed anything new, that is until today. I’ve re-written a game from scratch I had lost on a corrupted floppy disk called dodge, yes it’s not much of a name but it basically describes the game play in that you dodge bad guys and collect score items for points.

The original had some basic EGA graphics, but this time I’ve used text-mode characters for a few reasons. Firstly it is easier to implement both in terms of artistry and coding. The original never played well due to slowness in the interpreter and the implementation that I had used. Finally using text mode increases the effective game play area by more than 2 times, which will allow for more interesting levels. I didn’t implement any sound either in the original or the new, but music often doesn’t work well on the PC speaker and I’m not much of a composer.

Being a much more mature programmer I’ve been able to make the game mechanics better and add features that the original never had. There are 5 built-in levels (encoded with RLE) and the capability to load a custom level from a text file. The original only had randomly generated levels that were far simpler. I’ve made the movement of enemies better than the original, and gravity affects most items so that they are reachable.

I could have coded the game using a modern text editor, but for that extra nostalgic feeling I coded the game in the interpreter itself. It’s not as intuitive of course as you can’t simply scroll back and forth through the program, instead needing to use the LIST command to see parts of the program. Luckily editing a line isn’t too bad, as after display it on screen you can edit it easily. The interpreter also has some nifty features that can speed up program entry. Many keywords such as print and input have keyboard shortcuts, usually alt and the first letter of the keyword. This not only sped up program entry, but enabled me to discover keywords when I was a kid before I got the user manual.

Using the interpreter to write the code does however demand much more of your memory, I don’t remember this bothering me much as a kid, but now as an adult I needed to keep notes on the structure of the program and variables used. I would have written these out by hand on a notebook back in the day, but doing the coding with the interpreter running in dosbox meant I could run a text editor for my notes. It’s probably one of my more complex gwbasic programs at 592 lines of code.

Overall it has been quite fun writing with gwbasic again, like revisiting somewhere you went as a kid. I’m making the code available at the usual places, both my download website, and my google drive (in case the website is down). The code should work on QBasic as well as the original interpreter.


Creating a benchmark: part 6 problem solved!

Quite some time ago, in fact more than a year ago now, I was working on a basic series of benchmarks to test the comparative performance of the Borland Graphics Interface (BGI) versus a hand coded graphics library. I ran into a problem with my hand coded library not working on some of my real hardware. I ran the tests on a Pentium MMX @ 200Mhz and a 386sx @ 20mhz, it ran fine on the newer machine whilst failing on the older one. I figured perhaps I was overloading the older graphics chip.

So coming back to the problem today with a renewed sense of determination, I did some reading. There happened to be a book in the university library about programming VGA graphics, and I noted that all of their code only copies single bytes to graphic memory at a time, I was copying 2 bytes (a 16 bit word) at a time so wondered if that might be the issue. I changed it and the program still crashed almost immediately.

After some tinkering and some basic math I worked out that sprites being drawn near to the bottom of the screen were the cause of the problem. The test algorithm actually allows sprites to be drawn partially obscured by the right or bottom edge of the screen. On most VGA cards this isn’t a problem, but the trident chip in the 386sx didn’t like any pixels being drawn outside of the visible frame buffer. After writing some basic clipping into the sprite routines the program worked all the way through.

I’ve tested 3 different programs on two different real machines (as opposed to emulation). BGIbench is the original program that uses the provided graphics libraries that came with Turbo Pascal, I’ve used it in conjunction with the VGA256 BGI driver. VGABench is the initial lazy implementation of a hand made graphics library, it’s implemented entirely in pascal and isn’t optimised at all. Lastly is VGABench2 which is an optimised version of VGABench using assembly where necessary, and in the case of line drawing a better algorithm. All three programs were coded and compiled with Turbo Pascal 6.0 and use the same basic test code.

Each program performs 7 tests based around primitive functions found in the BGI. Each test counts the number of primitives that can be drawn over a period of 30 seconds. VGABench doesn’t have a function for drawing circles so it has no results for that test. The tests in order: put-pixels simply draws individual pixels a random colour. Filled Boxes draws solid coloured rectangles in a pre-defined pattern. Circles draws a number of concentric circles in a per-determined way. Random lines does what you’d expect, drawing lines randomly. Horizontal and vertical lines are similarly obvious. Finally the sprite test draws a 10×10 bitmap at random locations on the screen.

At first glance it’s pretty obvious that optimised, hand written code is significantly faster on the Pentium, particularly for filled boxes and sprites where VGABench2 achieves roughly twice the output to the screen. I managed a five-fold increase in the rate of drawing circles, which is impressive as circles are the hardest to draw. The first VGA Bench however is not really much better than the BGI and in some tests actually performs worse. You’ll note that the put-pixel tests all come out fairly close in terms of results, this is because there is little to optimise there.

I suspected that the reason VGABench2 performed so well is because the code copies 16 bits at a time instead of 8bits. I tested this out by changing it to copy 8bits at a time and found it was still faster than the BGI, but by a much smaller margin. VGABench2 copying 16bits yields about 156k sprites, but modifying it to copy 8bits yielded about 80k compared to BGI which achieves around 73k sprites. The first VGABench demonstrates how important optimisation is. It copies data 16bits at a time, but doesn’t even achieve the performance that BGI does, managing around 68k sprites.


The picture looks quite different on the 386sx machine, with the performance looking much more even with a few exceptions. The BGI seems to perform comparatively well across most categories only lagging behind in drawing circles, filled boxes and sprites. My first lazy implementation, VGABench seems to lag behind in pretty much everything except drawing filled boxes, which barely outperforms the BGI.

The results for VGABench2 are good, but not as good as on the Pentium machine. Line drawing is basically the same speed as the BGI. Filled boxes achieves about twice the speed, and circles about 5 times the speed, but the sprites are comparatively slower at about 1.5 times the speed. The explanation for the performance of sprites and filled boxes is interesting and is related to how the test is implemented. The filled boxes are drawn in a deterministic way, a grid of 10×10 sized boxes, the sprites are distributed randomly. Filled boxes end up being drawn pretty much always on even addresses, and sprites will be drawn on even and odd addresses around the same amount. This affects speed because of something called word alignment.

The 386sx, 286 and 8086 processors have a 16 bit data bus, which means the processor can access 16bits at a time. The memory is organised as a bunch of 16bit words, so when accessing 16 bits on an even address only a single memory word is accessed, and when a 16 bit access needs an odd address, two memory words are required. This means doing 16bit reads/writes on odd addresses are half as fast as on even ones, in fact they are about the same speed as an 8 bit transfer.

In beginning to sum up the series, it’s important to remember why I started it at all. Basically I remembered hearing many people discouraging use of the BGI mostly because it is slow compared to hand crafted code. The tests I’ve done have confirmed this, but I feel that it’s also shown how a lazy (or poor) implementation can be even slower. My optimised code is faster, but took a lot of time and effort to create and is missing many features that the BGI provides, such as support for other graphics cards, clipping, and other graphics primitives that are more complicated. I can see why many people without the time and know-how would have found it easier to simply use the graphics library provided.

That being said, hand written optimised code certainly has an important place as well. It was pretty fun and challenging to try and make something faster, even though I almost certainly didn’t get my code anywhere near as fast as more proficient assembly programmers. Also it’s hand optimised code that made many PC action games possible at all. Gaming on the PC would be very different without the hardware guru’s that could squeeze amazing things out of basic hardware. Writing this has made me reconsider my stance on sticking with the BGI for my platform game, I probably won’t gain a lot of performance, but I may be able to get it to work on older hardware as a consequence.

Code and binaries are available from the pascal downloads.


Huffman Coding – Compression results

Before the Christmas break I started writing a encoding/compression library based on a technique called Huffman coding. I’ve since completed and tested the encoder and decoder, today we’ll discuss the results of several compression tests in comparison to the RLE encoding technique. Here are links back to the posts for Huffman Coding and Run-Length Encoding.

So I’ve collected a few different types of data files with different characteristics. First up is a MS-DOS executable, these binary data files typically don’t contain runs of data and will tend to contain most if not all symbols. Next is a ASCII text file, which will typically only use alphabetic, numeric and punctuation characters. Lastly a small number of graphics files, in raw bitmap form, these particular images are fairly basic (only a handful of colours).

Here’s a table showing the results.

Compression ratio table

You’ll note that the Huffman encoder achieves better results in every example! The worst result for it being the MS-DOS executable, which is of course expected. I was pleasantly surprised that the small graphic files compressed quite well, this is likely because of how few symbols are in these files. If there had been a large variety of symbols the dictionary and the encoded data would be much larger.

Run-length encoding was obviously much worse, and in the case of ASCII text and executable data it actually increased the size of the data! Knowing how it works this is hardly surprising. I did expect RLE to perform better than Huffman coding for the graphic data, but I suspect the low colour count in these files is an influencing factor.

So I thought I’d create a graphic data file that had a higher colour count that would favour RLE to see if it can do better under specific conditions. The graphic data I created is a sprite 32×32 pixels with horizontal coloured stripes, each a unique colour. The size of the raw file came out at 1026 bytes.

The Huffman encoder produced a file of 766 bytes and the RLE produced one of 68 bytes! Whilst the Huffman encoder still managed to compress the data, it couldn’t match the best case scenario for RLE.

There are good reasons to use both compression techniques. Run-length encoding doesn’t require much CPU power to encode or decode, so it can be done on very weak machines (even old 8-bit machines), but it only compresses data with many runs in it well. This turns out to be useful for graphics and level data. Huffman coding will pretty much always reduce the size of the data encoded, sometimes significantly, but it is much more complex. It requires more RAM and CPU power to achieve this result, so it can’t be used on every machine, although most modern processors would have no trouble at all.

I’ve made the code for the Huffman encoder/decoder available in pascal for download. be aware the encoder still has some debug code in it that will write to the screen.


Huffman Coding

Quite some time ago I did a short post about a compression technique called run-length encoding (or RLE), that was a commonly used compression method for graphic and level data. At the time I wrote and implemented it for the graphics and level data for my home-brew game Bobs Fury with quite the success, significantly reducing the disk space required for the base game.

I do however have some files left, which are largely just ASCII text, that don’t compress well or at all using that technique. I’d like to make them smaller, and of course harder to read as they contain story text, not that my game has a brilliant story, but you get the idea. Step in an encoding technique called Huffman coding.

Essentially the encoding algorithm encodes symbols as a variable length stream of bits. The most frequently used symbols are represented by shorter bit streams, whilst the least used have longer ones. Normally in a computer each symbol would be encoded as a fixed number of bits, such as say 8-bit or 16, so this results (hopefully) in shorter encodings for most symbols, and longer ones only for the rarely used ones.

The tricky part is creating the Huffman tree, which is basically a code-book representing how each symbol is encoded or decoded. Here is a quick tutorial on how they are created, which will also give you a feel for how the encoding works. It’s also commonly known as a dictionary, or code book.

A fixed tree can be used for everything, but would not do the best job for every set of data being compressed. So typically a tree is created and stored along with the encoded data to achieve the best compression possible. This of course does add some overhead, which could be a problem if the resulting encoding isn’t much shorter than the original.

Huffman coding typically works poorly when the symbols all appear in the text at roughly the same frequency. The worst case being everything with exactly the same frequency. Notably it won’t produce an encoding that is longer than the original data, although with the overhead of storing the tree, you could end up with a larger data file. In practise this rarely happens.

Other data such as English text stored in ASCII actually compresses quite well. As usually not all the 255 characters are used, most encodings will be shorter than 8 bits per symbol. Also because natural language uses some letters more than others the average encoding length for all the symbols will be shorter.

Huffman coding was actually invented quite some time ago (1951) by David Huffman, well before it came into common use. Check out the Wikipedia page for more information. It’s a part of many commonly used compression programs and formats such a Zip and Bzip. Older 8-bit machines typically weren’t powerful enough, so it wasn’t commonly used until more powerful machines with more memory became available.

It took me much longer than usual to write this post primarily because I began the process of writing an encoder and decoder, but because of the complexity, it’s taken up much more time than I expected. Currently I have just finished the encoder, but have yet to test it. I had hoped to get the code running first, but that will have to wait.


QBasic Gorillas

Qbasic GorillasToday I’m looking at the classic old artillery game Gorillas, it was an example program for the Qbasic interpreter that was packaged with MS-DOS 5.0 and later. Because it was so widespread, being on practically every machine of its time, it was widely played and loved by many. I first encountered it on our high school computers in computer studies classes, we often got to play some games after we finished our work. We played many games, but Gorillas  (and Nibbles) were favourites.

Dancing Gorillas!

Dancing Gorillas!

Being written in Qbasic graphics and sound support is fairly basic. Unless you’re using an old machine with CGA only, the graphics are in high resolution EGA (640x350x16) and whilst not spectacular have some charm. Sound is PC speaker, again largely due to the limits of Qbasic. Most sounds are fairly basic, although the intro tune is kinda cool.

City Skyline

City Skyline

The game field consists of a city skyline with two Gorillas atop a building at opposite ends of the screen. Each Gorilla takes turns hurling an explosive banana at the other, with the player aiming the shots by entering the angle and velocity. The round only ends when one of the Gorillas is hit by a banana, with the survivor being the winner. There was no computer AI, so you had to play it in a hot-seat style or on your own. It’s simple and fun to play, although there isn’t much variety.

Target hit!

Target hit!

Normally this is where a post like this would end, with some kind of summary of what I thought. Today however I decided to have a quick go at making a simple modification to the game, adding an AI to the game so you can play solo. The tricky part with making an AI player in this case isn’t making something that will play well, but making something a human has a chance of beating. I could quite easily make it simply calculate the ideal velocity and angle, but that wouldn’t be much fun.

A winner is you.

A winner is you.

So what have I done instead? It’s a fairly simple algorithm, I set the initial aim to some sensible defaults and after each shot adjust the velocity depending on whether the shot landed short or long. This actually proved to be quite good at making hits, but not before making a few shots giving a human player a chance. Occasionally they will make a hit on the first shot, but that only happens when the buildings are set up just right. One circumstance that the computer does poorly is when a tall building is blocking the path of the bananas. I deal with this to a degree by making the angle higher when the banana doesn’t go very far. It will still take many shots for the AI to succeed.

I’ve made the modified version available here. It requires the original Qbasic to run and DOS in one form or another (Dosbox recommended). The game is pretty much unchanged apart from adding the AI, which you activate by naming a player Computer. You can have the computer play itself by naming both players Computer. Another improved version of the game exists, and has improvements such as a league table and improved graphics and sound. It’s called Gorillas Deluxe and can be found here.


Creating a Benchmark: part 5 – a road-block in testing

Some months ago I set about working on creating a bench mark program to compare the Borland Graphics Interface (BGI for short) with a hand coded in assembly VGA library. I had completed work on the library and even tested it on my Pentium 200Mhz MMX. So I decided to do the final tests to get the results when I ran into a road-block that still has me stumped.

I had three different setups to test the code on. Dosbox (at a fixed cycle count), my Pentium 200Mhz MMX machine running MS-DOS 6.22 and this machine (the last in the post), a 386sx running at 20Mhz. The BGI code runs fine on all three machines, but the hand coded library fails on the 386sx locking up during the first test – blitting sprites.

This was quite confusing as the test appears to at least partially work as a number of sprites are drawn to the screen quite quickly, but sometime during the test it always freezes. I thought about what could cause this and decided to try disabling interrupts during the drawing process. I did this as the main instruction used for copying bytes to video RAM does have an interrupt bug on older processors (286). This unfortunately had no effect. See this previous post to see code I used for copying memory.

Whilst disabling interrupts didn’t fix the problem, it did confirm which instruction was causing the problem. The rep movsw instruction was the only one I put between disabling and enabling interrupts (using cli and sti). When the machine froze with this change it no longer responded to keyboard interrupts that it did before. This indicates a crash during either an interrupt itself (unlikely) or this repeating instruction.

Given this strange problem I decided to test the machine by running memory tests and another graphical benchmark to see if there could be a hardware issue. It seems that the memory is ok, and it ran a 3d VGA benchmark with no trouble over night. I’ve also played some games such as Hocus Pocus that have parallax scrolling and would put a bit of load on the relevant parts with no problems.

So the question arises, am I putting too much data/strain into the Trident graphics chip for it to cope? I’m unsure if i can answer this. The process of copying the memory has several overheads. The processor has to read the word, write the word, increment the memory indexes and then decide whether to stop copying for each iteration of the instruction. This would at best be one word copied per 2 bus clock cycles, but is likely to be quite a bit slower, from memory I think like 6 clocks per memory copy for a 386.

I think it unlikely that I’m over taxing the graphic chip, but this problem does highlight one of the major problems people had when creating their own graphics libraries. Small shareware developers couldn’t have tested enough hardware to ensure that what they wrote would work on a large array of systems. The BGI however, having more resources behind it, has a quite good compatibility record.


Creating a benchmark: Part 4

Last time I used the  in-line assembler to improve the speed of the sprite blitting functions with lots of success. Other functions such as line drawing however still suffered speed issues on systems with out a built-in FPU such as the 386sx.

The reason for the slow line drawing was simple, I had used a simple slope based algorithm that used floating point. On a system with a FPU this was quite fast, about as fast as anything else, but obviously it wasn’t going to cut the mustard when it came to the old processors I’m targeting. So I decided to give fixed point arithmetic a go.

Fixed point numbers are a variation on basic integers, using some of the bits to represent the fractional part. You can then use integer instructions to do some basic math that involves fractions. This is much faster for processors lacking hardware floating point support. Check out more details of fixed point here or at Wikipedia. It’s a very common technique for embedded systems, but was also used in games such as Doom for speed.

I however ran into a problem. Because the integer in turbo pascal is 16bits I didn’t have enough bits for addressing all the pixels and have enough precision to for the fractional part of the slope. For the integer part I needed to have a range of 0-319, requiring 9 bits unsigned. The fractional part was thus left with 7 bits, with the smallest representable fraction being 1/128th. The smallest conceivable slope in 320×200 is 1/320 which obviously is much smaller. I could have switched to using longints or storing the fractional part separately but this would have added extra overhead that would make line drawing slower.

I set the line drawing problem aside for the moment to look for a quick circle drawing algorithm. Something I hadn’t implemented yet. The SWAG archive came up with suitable options, but also had some algorithms for line drawing. In particular the Bresenham line drawing algorithm.

Not having coded my own low level line drawing routines before I hadn’t heard of it, but it uses all integer math in a clever way to produce the correct slope. I modified the algorithm I found to reduce the amount of calculations for the screen address when plotting points in the line. The resulting code was slightly faster than previous line drawing code, but not overwhelmingly so.

Returning to implementing circle drawing, the algorithms I found unfortunately used floating point math. They all use the basic formula for a circle, x^2 + y^2 = r^2, thus requiring the use of a square root function to calculate the co-ordinates. Unfortunately it is one of the more expensive floating point operations.

One algorithm I looked at used the square root function, but rounded it to an integer immediately. This got me thinking, if I could implement an approximation of square root using integer arithmetic I could draw circles quickly. So after a bit of research I wrote exactly that. Code follows.

function intSqrt(num: word):word;
     {we're using Newtons method for approximating the sqrt}
     if (num=0) then
     xo := 0;
     xn := num;
     if xn=0 then xn:=1;
     while (abs(xo-xn) > 1) do
          xo := xn;
          xn := (xo + (num div xo)) shr 1;
     intSqrt := xn;

As stated in the comment, I’ve used Newton’s method for calculating the root, mostly because it’s a simple method I know. I did a little testing, and it turns out this returns pretty much the same result as rounding the built-in sqrt function that uses floating point, only quicker. Once I finished the circle drawing code it drew around 36% more circles in the same time as the BGI code. That’s not bad, but I think there may be more room for optimisation here.

Next time I hope to have optimised the circle routine, and I’ll test the code on some hardware to see how all the different libraries compare to each other on different platforms.

Blogs I Follow

Enter your email address to follow this blog and receive notifications of new posts by email.

Mister G Kids

A daily comic about real stuff little kids say in school. By Matt Gajdoš

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm's Vintage Computing, Tech & Scale RC Blog

Random mutterings on retro computing, old technology, some new, plus radio controlled scale modelling.


retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way


MS-DOS game reviews, retro ramblings and more...