Archive for January, 2017


Creating a benchmark: part 6 problem solved!

Quite some time ago, in fact more than a year ago now, I was working on a basic series of benchmarks to test the comparative performance of the Borland Graphics Interface (BGI) versus a hand coded graphics library. I ran into a problem with my hand coded library not working on some of my real hardware. I ran the tests on a Pentium MMX @ 200Mhz and a 386sx @ 20mhz, it ran fine on the newer machine whilst failing on the older one. I figured perhaps I was overloading the older graphics chip.

So coming back to the problem today with a renewed sense of determination, I did some reading. There happened to be a book in the university library about programming VGA graphics, and I noted that all of their code only copies single bytes to graphic memory at a time, I was copying 2 bytes (a 16 bit word) at a time so wondered if that might be the issue. I changed it and the program still crashed almost immediately.

After some tinkering and some basic math I worked out that sprites being drawn near to the bottom of the screen were the cause of the problem. The test algorithm actually allows sprites to be drawn partially obscured by the right or bottom edge of the screen. On most VGA cards this isn’t a problem, but the trident chip in the 386sx didn’t like any pixels being drawn outside of the visible frame buffer. After writing some basic clipping into the sprite routines the program worked all the way through.

I’ve tested 3 different programs on two different real machines (as opposed to emulation). BGIbench is the original program that uses the provided graphics libraries that came with Turbo Pascal, I’ve used it in conjunction with the VGA256 BGI driver. VGABench is the initial lazy implementation of a hand made graphics library, it’s implemented entirely in pascal and isn’t optimised at all. Lastly is VGABench2 which is an optimised version of VGABench using assembly where necessary, and in the case of line drawing a better algorithm. All three programs were coded and compiled with Turbo Pascal 6.0 and use the same basic test code.

Each program performs 7 tests based around primitive functions found in the BGI. Each test counts the number of primitives that can be drawn over a period of 30 seconds. VGABench doesn’t have a function for drawing circles so it has no results for that test. The tests in order: put-pixels simply draws individual pixels a random colour. Filled Boxes draws solid coloured rectangles in a pre-defined pattern. Circles draws a number of concentric circles in a per-determined way. Random lines does what you’d expect, drawing lines randomly. Horizontal and vertical lines are similarly obvious. Finally the sprite test draws a 10×10 bitmap at random locations on the screen.

At first glance it’s pretty obvious that optimised, hand written code is significantly faster on the Pentium, particularly for filled boxes and sprites where VGABench2 achieves roughly twice the output to the screen. I managed a five-fold increase in the rate of drawing circles, which is impressive as circles are the hardest to draw. The first VGA Bench however is not really much better than the BGI and in some tests actually performs worse. You’ll note that the put-pixel tests all come out fairly close in terms of results, this is because there is little to optimise there.

I suspected that the reason VGABench2 performed so well is because the code copies 16 bits at a time instead of 8bits. I tested this out by changing it to copy 8bits at a time and found it was still faster than the BGI, but by a much smaller margin. VGABench2 copying 16bits yields about 156k sprites, but modifying it to copy 8bits yielded about 80k compared to BGI which achieves around 73k sprites. The first VGABench demonstrates how important optimisation is. It copies data 16bits at a time, but doesn’t even achieve the performance that BGI does, managing around 68k sprites.


The picture looks quite different on the 386sx machine, with the performance looking much more even with a few exceptions. The BGI seems to perform comparatively well across most categories only lagging behind in drawing circles, filled boxes and sprites. My first lazy implementation, VGABench seems to lag behind in pretty much everything except drawing filled boxes, which barely outperforms the BGI.

The results for VGABench2 are good, but not as good as on the Pentium machine. Line drawing is basically the same speed as the BGI. Filled boxes achieves about twice the speed, and circles about 5 times the speed, but the sprites are comparatively slower at about 1.5 times the speed. The explanation for the performance of sprites and filled boxes is interesting and is related to how the test is implemented. The filled boxes are drawn in a deterministic way, a grid of 10×10 sized boxes, the sprites are distributed randomly. Filled boxes end up being drawn pretty much always on even addresses, and sprites will be drawn on even and odd addresses around the same amount. This affects speed because of something called word alignment.

The 386sx, 286 and 8086 processors have a 16 bit data bus, which means the processor can access 16bits at a time. The memory is organised as a bunch of 16bit words, so when accessing 16 bits on an even address only a single memory word is accessed, and when a 16 bit access needs an odd address, two memory words are required. This means doing 16bit reads/writes on odd addresses are half as fast as on even ones, in fact they are about the same speed as an 8 bit transfer.

In beginning to sum up the series, it’s important to remember why I started it at all. Basically I remembered hearing many people discouraging use of the BGI mostly because it is slow compared to hand crafted code. The tests I’ve done have confirmed this, but I feel that it’s also shown how a lazy (or poor) implementation can be even slower. My optimised code is faster, but took a lot of time and effort to create and is missing many features that the BGI provides, such as support for other graphics cards, clipping, and other graphics primitives that are more complicated. I can see why many people without the time and know-how would have found it easier to simply use the graphics library provided.

That being said, hand written optimised code certainly has an important place as well. It was pretty fun and challenging to try and make something faster, even though I almost certainly didn’t get my code anywhere near as fast as more proficient assembly programmers. Also it’s hand optimised code that made many PC action games possible at all. Gaming on the PC would be very different without the hardware guru’s that could squeeze amazing things out of basic hardware. Writing this has made me reconsider my stance on sticking with the BGI for my platform game, I probably won’t gain a lot of performance, but I may be able to get it to work on older hardware as a consequence.

Code and binaries are available from the pascal downloads.


Mad Painter for DOS

I’ve been away on my usual holiday break and recently got home. Whilst I was away at my parents place it was really quite intensely hot, like 40+ degrees Celsius in the shade. It’s not unusual to get a few days like that where they live, but the heat seemed to linger for longer this year. In heat like that (and without air conditioning) you can’t really do all that much, most people usually just try to keep cool and lie down during the worst heat of the day.

It’s usually a good time to sit and play a game, I usually favour something that doesn’t tax my brain too much which leads us to today’s game. Mad Painter is a fairly simple arcade style maze game. Essentially you are driving a paint truck for your city, painting the road markings as you drive around. You have to paint all the roads to progress to the next level, avoiding a few cars driving around. You’re scored based on how much of the level you paint. I’m playing the shareware episode which only has one city.

The game has EGA graphics and PC speaker sound support. The level is larger than the screen and scrolls around impressively smoothly even on lower speed hardware. In fact you’ll probably want to go for an older 286 machine, an upgraded XT or AT or equivalent cycles in Dosbox. The game runs quite fast even on slower hardware, playing ok at ~500 Dosbox cycles, and probably optimally at around 1000 (about a 286 @ 12Mhz using TopBench to benchmark). Despite this in game speed, the menus and transition screens take much more time to draw and animate. Looking at the graphics in game I wonder if it is using a hacked text mode to achieve the speed, and the other screens use a genuine graphics mode. Sound is pretty basic, but not bad, and you can turn it off if you don’t like it.

The game controls in much the same manner as Pac-man, allowing you to anticipate a turn before you arrive. The maze however is no-where near as dense, so there are fewer places to turn and more potential to be trapped by enemies. Luckily there are only two cars which are however driven by maniacs who speed around like they are driving a super car. It is their speed which makes them hazardous, as they drive randomly and will not seek you out.

Mad Painter serves its purpose well, it’s a fun little distraction that you can quickly play and not have to commit too much effort or time.

This slideshow requires JavaScript.

Blogs I Follow

Enter your email address to follow this blog and receive notifications of new posts by email.

Mister G Kids

A daily comic about real stuff little kids say in school. By Matt GajdoĊĦ

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm's Vintage Computing, Tech & Scale RC Blog

Random mutterings on retro computing, old technology, some new, plus radio controlled scale modelling.


retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way


MS-DOS game reviews, retro ramblings and more...