Creating a Benchmark: part 3

Last time I started the process of comparing the BGI to a hand coded VGA library. I coded up a fairly lazy completely Pascal library. Today I’ve re-coded some parts of that code using x86 assembly with Pascals in-line assembler.

At work in the IDE

At work in the IDE

The first thing I wanted to tackle was the speed of the sprite blitting and filled boxes as they didn’t seem to live up to their potential. I decided to replace the Pascal move (copy memory) and fillchar (fill memory) functions as they are heavily used in the Pascal only version.

Luckily there are some neat instructions which make copying and filling memory faster even on old processors like the 8086 and 80286. These are MOVS and STOS, both string instructions which actually owe their existence to the Z80 and i8080 where they first appeared. Using them with the REP prefix makes them even better as it helps eliminate some looping code.

MOVS is for copying a block of memory, you load the ES:DI registers with the destination pointer and DS:SI with the source pointer and CX with the loop count. Then execute…

shr cx,1
rep movsw
jcxz @next
loop @again
jnc @done

This code will copy any count of bytes one word (16 bits) at a time, copying a single byte at the end if you specified an odd count. I’ve used JCXZ and LOOP to continue the data copying as some older processors have a bug where the REP MOVSW can end early if an interrupt occurs at the wrong time. I know this isn’t strictly necessary, but it’s a safety measure.

STOS works in much the same way, just it doesn’t source the data from a memory pointer, it uses the accumulator register instead.

With these new memory copy and fill routines done I tested the program to see if I had any improvement in performance. To my surprise there was none, the built-in functions for copying and filling memory must be about as good as what I wrote, but why is the blitting and box filling still slower than they should be?

It turned out that the loops in the filledBox and putImage functions were the culprit. The pascal code looked like this for putImages main loop…

for i:= 0 to sizey do
         copymem(bseg,4+(i*sizex),cardseg,((y+i)*320)+x, sizex);

It didn’t look problematic until I considered the instructions required for calculating the offset into the image data and screen buffer. Multiplication is an unfortunately slow operation, and with some nifty assembly code I rewrote both the putImage and filledBox procedures mostly in assembly, avoiding multiplications in the main part of the loop altogether.

It took me about 2-3 days to get through all the work re-writing two of the drawing functions in assembly, when it took about 1 to write the basic VGA graphics to begin with, but boy did it pay off. After re-writing most of putImage and filledBox in assembly I increased their performance by over 3 times for putImage and almost 2 times for filledBox. Both are also now significantly faster than the BGI implementation, being about twice as fast.

So the BGI is slow compared to raw x86 assembly after all, but it took significant effort to get that performance gain. For the myriad of one-man shareware programmers I still understand why they just went with the BGI, it was easy to use and good enough for what they were doing.

Making a VGA library with straight pascal was fairly easy to do, but had some disadvantages over BGI and wasn’t really quicker. I had to go to assembly before there was any significant performance gain. Coding assembly is daunting to many programmers, and for me is much more time consuming than writing in a higher level language. It will be quite some time before I really finish re-coding the library in assembly.

Next time I’ll have to tackle the line drawing functions, which are using some floating point numbers to accurately draw the lines. I’m planning on converting them to using fixed point numbers to improve speed on machines without an FPU, like my old 386sx. I’m also hoping assembly will help speed things up there to.


Hardware pickups

Recently I’ve been able to pick up some interesting hardware and I thought I’d share some photos of it with you. It was also an opportunity to try out some better lighting in the hope of getting better pictures.

387sx 33MhzThis unfortunately wasn’t the greatest photo due to the reflectivity of the packaging.

First up is a pair of 80387sx co-processors from the mid 80’s. Very few people actually bought and installed these chips as floating point arithmetic was mostly only used in scientific applications or Computer Aided Design. Consequently chips like these can be quite rare, and this pair clock in a 33Mhz making them some of the faster 387’s.

Intel weren’t the only company making co-processors for the 386, these included Cyrix, Chips and Technologies,  IIT, ULSI and Weitek. Most of these were faster than the Intel part, but had some compatibility issues, and some were of completely different designs.

Interestingly the NPU’s as they were known could be clocked asynchronously from the CPU. They also could operate whilst the CPU was busy doing something else, which gave machines with these some very crude parallel capabilities.

Here we have a Sun Microsystems mainboard from a Sparcstation IPX. The machine came in a neat lunchbox form-factor that was actually impressively small. This particular board has a Weitek Sparc processor that ran about 40Mhz. These chips had an FPU on-die, so they would have been similar to the 486 in performance. The LSI chip and some of it’s supporting chips are likely the 1Mb of system cache which was quite large for the time. The Sun GX chip is a graphics controller which contained some basic drawing acceleration. These features made the IPX quite an impressive little workstation. Most of the chips and the board itself appear to be manufactured in 1993.

It’s a shame I don’t have the rest of the machine, I’d like to be able to run this little beast. I’m not even sure I can get RAM for it, or if what I have is compatible. I’ll have to keep an eye out for the chassis and other parts.

Mechanical Keyboard

Mechanical Keyboard

This might look like an ordinary keyboard, but it is a proper mechanical switch keyboard that came with the next piece of hardware (a PC clone). Despite its very plain looks it feels fantastic to type on and has that distinctive mechanical sound. It has a larger DIN plug which actually suites many machines up to and included many Pentium based machines. It is a bit grubby but in otherwise good condition.


Lastly we have an interesting PC clone. This one was made by a company called Microbyte, which turns out that they were an Australian company based in Adelaide who made PC clones such as this one. It is clear that they designed and built their own boards and wrote their own PC compatible BIOS. Quite an achievement for what must have been a small engineering company. I found very little information about them online unfortunately.

My machine is a PC230sx, which has a 386sx@20Mhz with a Trident VGA card. It has SIPP memory fitted for both the main memory and video memory. They installed an unusually large amount for the VGA, having a full 1Mb of video memory. The system RAM is 2Mb in total.

When I bought this machine I didn’t think it had a hard disk, but it turns out that it has a Seagate ST3144A which is 130Mb. Probably an impressive and expensive drive in it’s day. This drive still works, I just had to configure the CMOS with the drives details which are handily written all over the machine.

You may notice the socket for a WD33c93 chip, this was a SCSI controller chip. This would have to be one of the few older machines that have the capability of on-board SCSI. I’m not sure why the chip is missing here, but these machines were apparently commonly fitted with SCSI drives instead. Looking in the BIOS seems to indicate that they were supported for booting. I may have to find one of these chips and see if I can get SCSI to work.

Between the VGA chip and VLSI chips lays an extremely long header where the expansion riser card would normally be inserted. This machine doesn’t have the riser card, so I can’t plug in a sound card or anything else which is a bit of a shame. I’m surprised the machine works without it as I’ve seen many other machines which don’t work correctly or at all when it is missing.

This board has some stickers that look like they were written by a service technician, they are attached to a part of the board under the floppy drive where there is a blank area containing no visible traces or chips. The first sticker reports an invalid opcode at a particular memory address which could indicate a problem with RAM or software.

Fortunately after testing the machine I’ve found the only problem so far is the malfunctioning COM1, the rest of the machine appears to be functional, and the IDE hard drive boots DOS ok. I have noticed that the Floppy drive light stays on, something which sometimes indicated incorrect installation of the cable. In this case the cable is correct, and the drive even reads disks, so there is likely a jumper setting on the drive that needs correcting.

I benchmarked this machine with Topbench to see how it compares to others. It was marginally faster than a 286@16Mhz with a 287 co-processor. I think there may be a few factors that contribute to this. Firstly I think the RAM must be a similar speed to that in the 286, thus slowing down the memory and opcode tests. It does perform better in the 3d games test which I found interesting as that has some floating point arithmetic. Luckily this is perfect for testing my homebrew platform game.

Finally I’m pleased with how the extra lighting has improved the pictures, but my technique still needs work. Perhaps another source of lighting is called for, or perhaps finally a step up to a better camera.


Creating a Benchmark: part2

A couple of weeks ago I created a basic benchmarking program for measuring the speed of the Borland Graphics Interface and its drivers. I’m primarily interested in how they compare not only to each other, but also to a hand-coded implementation. So this week I created a VGA graphics unit by hand and made a benchmark program around it.

I chose coding for VGA 320x200x256 as it is the easiest mode to code for and matches more of the BGI drivers. You simply initiate the graphics mode 13h (h for hex) with the video BIOS, this sets up a linear buffer for drawing at the memory location A000h:0000h. Each pixel is a single byte, so drawing a pixel doesn’t require bit masking unless you want it to. Drawing a pixel simply involves changing the byte at the offset following this simple formula. (y*320) + x.

Given this information it was no problem at all to code up a basic graphics unit. I didn’t use much in the way of assembly code to implement the unit, partly out of laziness, instead opting to implement it using Pascal code mostly. I haven’t implemented all the graphical functions in the BGI, simply because there are way too many.

Here are the results when tested under Dosbox with 3000 cycles. I used pretty much the same code to perform the measurement to ensure as much consistency as possible. It’s quite interesting to see that implementing your own graphics unit doesn’t really provide that much extra performance for most functions, and in this case blitting sprites is actually slower using my code! I suspect this is because I used a built-in Pascal function for copying memory that may not be super fast. I did note it is still faster than both the VESA and SVGA256M driver in the same mode.

So is it worth implementing your own graphics driver instead of using the BGI. The answer is a sorta, maybe. I haven’t optimised my graphics code in this case, so it surely could be a bit faster, but I did manage to use less memory for storing the sprites, and my code was much smaller in terms of size. However the Graph unit and VGA256 actually seem to have some decent performance comparatively, so if you need compatibility with other cards that are more difficult to code for, or simply don’t have time to code a graphics unit of your own, then the graph/BGI implementation isn’t too bad.

Code and DOS binary are available here.


Arctic Adventure for DOS

The title screen

Arctic Adventure was released in 1991, when the author George Broussard had just merged his company with Apogee. It is a sequel to the first game: Pharaoh’s Tomb, and shares the same game engine that was originally developed by Todd Replogle for Monuments of Mars. It shares most of its technical aspects with both of these games, as it uses exactly the same technologies.

Map ScreenAgain CGA graphics and PC Speaker sound were used, with about the same level to technical skill as both are roughly equivalent to the other games. The only really big change is using the white, cyan, and magenta CGA palette instead, which is quite appropriate given the Arctic theme. I noted that this time there was no performance warning for older machines, but I haven’t noted any significant improvement. So best to avoid the slowest 8088 and PCjr machines.

Game ScreenUnlike the other two games you start in an over-world style map which allows you to choose which level you wish to attempt. You need to gather keys and a boat to gain access to many of the levels, but you can attempt them in any order otherwise. Whilst you can only save at this screen, it’s quite  nice being able to return to this map screen without penalty so you can save your game, or choose another level if one is vexing you too much.

Not as easy as it looksEntering a level you’ll find similar collision issues that the other games suffered. The spikes in particular feel the most unfair as they will kill you without even touching your character. However overall it suffers from this much less than Pharaoh’s Tomb as you no longer have a limited number of lives. You simply return to the start of the level with everything you brought with you when you first arrived. This makes Death much less annoying as you can still progress even if you die many times, and you can choose another level when you get frustrated.

Looks simple enoughThe levels themselves are a mix of easier and harder puzzles, some of which are more a test of your platforming skills. They contain the same types of enemies and hazards as Pharaoh’s Tomb, just they have been re-skinned. It seems that the designer has made better use of these features as I didn’t run into the same problems as much, and the levels are much more enjoyable to play.

Like the other games Arctic Adventure was made freeware back in 2009, and is the better game of the three. It isn’t as frustrating as Pharaoh’s Tomb, but is more challenging than Monuments of Mars. Unfortunately it still suffers from some issues with the collision detection making some levels extra hard. If I had to pick a favourite, I’d probably favour Monuments of Mars, but Arctic Adventure is still quite enjoyable.


Creating my own Benchmark program

Looking at benchmarking programs last week got me thinking about creating my own testing software. I’m a Pascal programmer as far as DOS goes and I have read in many programming forums about how slow the Borland graphics interface is. I decided to test this theory out and find out just how slow or fast they are, and what effect the BGI driver and graphics mode has on the performance.

The Borland Graphics Interface is a library that Borland supplied with the Turbo C and Turbo Pascal products. It was used by many because it simplified drawing graphics and meant you didn’t need to write the code for drawing to the screen. This could save a lot of time for the individual programmer, and many shareware programmers used it in their simple games.

As I’ve said before, many have claimed that it is quite slow. So I have written a simple program called BGIbench to test the speed of any BGI driver. I test some of the more important graphics functions that will work on all the drivers, some like page flipping only work with specific drivers.

The drivers I’ve tested here are the standard Borland CGA and EGAVGA ones that came with Turbo Pascal 6 as well as VGA256 (a mode 13h driver), VESA (uses VESA compatible modes) and SVGA256M (another VESA driver) that I found when I started writing my platform game all that time ago.

BGIbench results for Dosbox @3000 cycles

BGIbench results for Dosbox @3000 cycles

Here are the results for testing performed under Dosbox. The result that matter the most for games is the sprites, notably the vga256 driver is the best in this category. All types of lines and circles are about the same between the drivers, although I have noted that drawing circles is pretty slow, slower than even Qbasic if I remember correctly.

Of interest is the filled boxes, which in theory should be about as fast as the sprites, but the EGA,VESA and SVGA256M drivers seem more capable at drawing filled boxes than blitting bitmaps. This doesn’t make sense, there must be something contributing to slow-down with bitmaps and these drivers. They are even slower than the pattern filled boxes!

BGIbench on a real machine pentium MMX 200Mhz

BGIbench on a real machine pentium MMX 200Mhz

On actual hardware things are much different. Notably the VESA driver is nearly as good as the VGA256 one at drawing sprites in the same resolution, and is in fact significantly faster at drawing filled boxes, which makes me think there is still some lost potential in that driver.

The SVGA256M driver has poor performance unfortunately where it counts, blitting sprites. In fact it’s almost as slow as the EGA driver, which is the worst performer. Again there must be significant lost potential as drawing filled boxes is significantly faster, but it suffers worse performance in most line drawing.

Circles and rendering text is again something that performs poorly across the board, so these types of primitives should be avoided if possible when writing BGI based software.

In summary the CGA and EGA drivers offer modes on older hardware that isn’t offered by the others, so they remain useful even though there are faster drivers. The VGA256 driver is the best for 320x200x256 sprite based graphics, but if you wanted to do vector/line drawing the VESA driver performs better in those areas. The VESA driver also offers higher resolution modes, although at a performance hit as it needs to switch memory banks for drawing. The SVGA256M driver is probably the least useful as others out-perform it in all areas and resolutions.

The question remains how fast is the BGI compared to writing a graphics driver yourself? This will be answered another day.

UPDATE: A Download of the results and the program can be found here.


Benchmark Software for DOS

Comparing the speed of computers is not a new thing, it has been done as long as there have been computers. It can be problematic picking a good benchmark, many output meaningless metrics, do not give consistent results, or simply fail by measuring the wrong things. Today I’m looking at a small collection of benchmarking utilities.

I am not equipped to capture from my DOS machine, so as a standard machine I used Dosbox with a setting of 3000 cycles. This isn’t ideal as it doesn’t mimic hardware exactly so it may cause some funny results. I’ve noted in particular that Dosbox’s video memory is faster than in real machines.

Topbench screen

Topbench screen

First up is a relatively new utility called TopBench. I’ve found this one handy for adjusting the emulation speed of Dosbox to approximate an actual machine. It benchmarks the system in real-time so any adjustments you make will be reflected on the screen straight away. The built-in database of machines gives you a good idea of roughly where you machine fits.

However the metrics provided aren’t really clear about what they are measuring. We get a measure of how long each type of test takes, but don’t get any indication of what each test does. So this is good for comparison, but isn’t going to tell you much about the various parts. Still I like this one a lot for tuning performance in Dosbox.



Chips and Technologies made a simple one that is supposed to measure the MIPS a system is capable of.  It would be good for comparing CPU speed and not much else, except that the MIPS rating doesn’t seem accurate for the speed. I thought Dosbox may have been the cause, but others have noted this as well on actual hardware.

Landmark Speed Test

Landmark Speed Test

Back in the day the Landmark Speed Test was frequently used to compare performance. It compares your system with an IBM AT and is measured in Mhz for that machine to achieve the same performance. It doesn’t tell you how fast the AT machine was, or what processor it was running. I guess you can still use this to compare between systems, but the metric is less useful.

It measures the speed of your graphic card in Characters per second which I thought was a bit odd. Mainly as very little performance limited software used text mode all that much, so I think the metric isn’t very useful. Also we don’t know if their test uses the BIOS routines or draws to the hardware directly, so this may not indicate anything about the hardware. Additionally this metric appears to be inaccurate.

Checkit CPU and FPU performance

Checkit CPU and FPU performance

Checkit isn’t just a benchmarking utility, it also provided technical information about the machine and offered some basic hard disk and floppy disk utilities. The benchmarks are better than some in that they provide the raw data in the form of Dhrystone and Whetstone loop counts.

Dhrystone and Whetstone were some basic benchmarking algorithms developed specifically for testing integer and floating-point instructions respectively. Both were synthetically designed for benchmarking machines, but will suffer inaccuracy due to compiler optimisations, and differences in the languages used to implement the benchmark. Despite the short commings these are still widely used benchmarks.

Checkit Character Through-put

Checkit Character Through-put

Notably Checkit also measures the graphics memory through-put using characters per second. Except they have separate measurements for rendering using the BIOS and directly handling the video buffer. You can see just how different the results are here. Whilst the results are a little more meaningful, it’s still measuring the text mode performance.

SpeedSys Results

SpeedSys Results

Lastly a commonly used tool amongst retro PC enthusiasts is SpeedSys. It is good for benchmarking faster DOS machines. I ran it on my Pentium MMX based MS-DOS 6.22 machine here, and you can see that it provides a lot of information about your hardware. The memory and hard disk graphs are perhaps the most interesting.

The memory speed graph shows the speed versus the data size (in KB). You can see on the graph several drops in speed, these roughly correspond to the L1 and L2 cache sizes. You’ll also notice how the both write graphs don’t seem to enjoy any speed boost from the cache. I can only assume this is because of the cache policy being write-through, but I can’t be certain.

The Speedsys test is probably the best one as it provides the most detail. The memory and hard disk tests are quite good as they give measurements that mean something outside of the benchmark. The only thing I would have liked is more detail about the graphics card, but there isn’t really any more room on screen.

Whichever benchmarking utility you use, remember to always only compare your results to those produced by the same program. Even where the same metric is used as in the case of Drhystone and Whetstone tests. Otherwise you’re really comparing apples and oranges.


Some Maintenance on a Canon A-200 20HD

Quite some time ago when I was back at my parents place I had a look at a Canon XT clone, the A-200 20HD. There were a few basic repairs needed, so when I recently saw my parents I spent some time on it.

Yuasa Battery

First thing I did was remove the Ni-Cad battery from the memory expansion board, to prevent any future problems with leaks. Leaky batteries are very hazardous, and can destroy whatever board they happen to leak upon. This often can damage a machine beyond repair. This battery had not leaked yet, but was not really holding a charge. It is a 3.6V  50mAh battery made by Yuasa from Japan, so it’s likely a quality cell, but still best to remove it. I wonder if given its small capacity I could use a super capacitor in its place.

The battery was the main reason I looked at the machine, but I remembered the MFM hard disk wasn’t spinning up. I had to test the machine anyway, to ensure it was working correctly after removing the battery. So I thought I’d try to get the drive spinning again. I tapped and banged it strategically a few times and that didn’t really help. So I rotated the stepper motor that drives the heads one step and it worked pretty much immediately. I wouldn’t normally do that given the risks, but I would have had to open the drive otherwise.

ADMAST Menu software

I rebooted the machine and MS-DOS 3.3 booted up of the hard disk without issue. Strangely it seemed to be only formatted as a 10MB disk, I had a look around the machine and found we had put a few games on it for testing. Here we see the menu screen installed on this machine called admast. Before MS-DOS 4.01 there was no shell or menu system included, so simple menu systems like this one filled the role.

This slideshow requires JavaScript.

The three games I had installed were Megapede, Nyet, and Cyrus Chess. All these games support using the MDA cards text mode for the simple graphics that they have. They also all run surprisingly well. Nyet runs pretty much the same as it does on faster machines. Megapede runs well, its speed is CPU dependent and this machine seems to be about ideal. Cyrus chess works about the same as a newer machine, minus the graphics, and it obviously takes longer to work out a good move.

I’m quite pleased to have gotten the machine working again as it would have to be the oldest machine anyone in the family owns. I’ll have to try some more software on it (if I can find any that is compatible) and see if I can get a hercules compatible graphics card. It would be cool if I could get an old version of simcity working on it.

Blogs I Follow

Enter your email address to follow this blog and receive notifications of new posts by email.

Mister G Kids

A daily comic about real stuff little kids say in school. By Matt Gajdoš

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm - Vintage & Retro Computing Blog

Random mutterings on retro computing, old technology, some new, plus any other stuff that interests me


retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way


MS-DOS game reviews, retro ramblings and more...


Get every new post delivered to your Inbox.