31
Jan
17

Creating a benchmark: part 6 problem solved!

Quite some time ago, in fact more than a year ago now, I was working on a basic series of benchmarks to test the comparative performance of the Borland Graphics Interface (BGI) versus a hand coded graphics library. I ran into a problem with my hand coded library not working on some of my real hardware. I ran the tests on a Pentium MMX @ 200Mhz and a 386sx @ 20mhz, it ran fine on the newer machine whilst failing on the older one. I figured perhaps I was overloading the older graphics chip.

So coming back to the problem today with a renewed sense of determination, I did some reading. There happened to be a book in the university library about programming VGA graphics, and I noted that all of their code only copies single bytes to graphic memory at a time, I was copying 2 bytes (a 16 bit word) at a time so wondered if that might be the issue. I changed it and the program still crashed almost immediately.

After some tinkering and some basic math I worked out that sprites being drawn near to the bottom of the screen were the cause of the problem. The test algorithm actually allows sprites to be drawn partially obscured by the right or bottom edge of the screen. On most VGA cards this isn’t a problem, but the trident chip in the 386sx didn’t like any pixels being drawn outside of the visible frame buffer. After writing some basic clipping into the sprite routines the program worked all the way through.

I’ve tested 3 different programs on two different real machines (as opposed to emulation). BGIbench is the original program that uses the provided graphics libraries that came with Turbo Pascal, I’ve used it in conjunction with the VGA256 BGI driver. VGABench is the initial lazy implementation of a hand made graphics library, it’s implemented entirely in pascal and isn’t optimised at all. Lastly is VGABench2 which is an optimised version of VGABench using assembly where necessary, and in the case of line drawing a better algorithm. All three programs were coded and compiled with Turbo Pascal 6.0 and use the same basic test code.

Each program performs 7 tests based around primitive functions found in the BGI. Each test counts the number of primitives that can be drawn over a period of 30 seconds. VGABench doesn’t have a function for drawing circles so it has no results for that test. The tests in order: put-pixels simply draws individual pixels a random colour. Filled Boxes draws solid coloured rectangles in a pre-defined pattern. Circles draws a number of concentric circles in a per-determined way. Random lines does what you’d expect, drawing lines randomly. Horizontal and vertical lines are similarly obvious. Finally the sprite test draws a 10×10 bitmap at random locations on the screen.

At first glance it’s pretty obvious that optimised, hand written code is significantly faster on the Pentium, particularly for filled boxes and sprites where VGABench2 achieves roughly twice the output to the screen. I managed a five-fold increase in the rate of drawing circles, which is impressive as circles are the hardest to draw. The first VGA Bench however is not really much better than the BGI and in some tests actually performs worse. You’ll note that the put-pixel tests all come out fairly close in terms of results, this is because there is little to optimise there.

I suspected that the reason VGABench2 performed so well is because the code copies 16 bits at a time instead of 8bits. I tested this out by changing it to copy 8bits at a time and found it was still faster than the BGI, but by a much smaller margin. VGABench2 copying 16bits yields about 156k sprites, but modifying it to copy 8bits yielded about 80k compared to BGI which achieves around 73k sprites. The first VGABench demonstrates how important optimisation is. It copies data 16bits at a time, but doesn’t even achieve the performance that BGI does, managing around 68k sprites.

386sx-20

The picture looks quite different on the 386sx machine, with the performance looking much more even with a few exceptions. The BGI seems to perform comparatively well across most categories only lagging behind in drawing circles, filled boxes and sprites. My first lazy implementation, VGABench seems to lag behind in pretty much everything except drawing filled boxes, which barely outperforms the BGI.

The results for VGABench2 are good, but not as good as on the Pentium machine. Line drawing is basically the same speed as the BGI. Filled boxes achieves about twice the speed, and circles about 5 times the speed, but the sprites are comparatively slower at about 1.5 times the speed. The explanation for the performance of sprites and filled boxes is interesting and is related to how the test is implemented. The filled boxes are drawn in a deterministic way, a grid of 10×10 sized boxes, the sprites are distributed randomly. Filled boxes end up being drawn pretty much always on even addresses, and sprites will be drawn on even and odd addresses around the same amount. This affects speed because of something called word alignment.

The 386sx, 286 and 8086 processors have a 16 bit data bus, which means the processor can access 16bits at a time. The memory is organised as a bunch of 16bit words, so when accessing 16 bits on an even address only a single memory word is accessed, and when a 16 bit access needs an odd address, two memory words are required. This means doing 16bit reads/writes on odd addresses are half as fast as on even ones, in fact they are about the same speed as an 8 bit transfer.

In beginning to sum up the series, it’s important to remember why I started it at all. Basically I remembered hearing many people discouraging use of the BGI mostly because it is slow compared to hand crafted code. The tests I’ve done have confirmed this, but I feel that it’s also shown how a lazy (or poor) implementation can be even slower. My optimised code is faster, but took a lot of time and effort to create and is missing many features that the BGI provides, such as support for other graphics cards, clipping, and other graphics primitives that are more complicated. I can see why many people without the time and know-how would have found it easier to simply use the graphics library provided.

That being said, hand written optimised code certainly has an important place as well. It was pretty fun and challenging to try and make something faster, even though I almost certainly didn’t get my code anywhere near as fast as more proficient assembly programmers. Also it’s hand optimised code that made many PC action games possible at all. Gaming on the PC would be very different without the hardware guru’s that could squeeze amazing things out of basic hardware. Writing this has made me reconsider my stance on sticking with the BGI for my platform game, I probably won’t gain a lot of performance, but I may be able to get it to work on older hardware as a consequence.

Code and binaries are available from the pascal downloads.

Advertisements

0 Responses to “Creating a benchmark: part 6 problem solved!”



  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Blogs I Follow

Enter your email address to follow this blog and receive notifications of new posts by email.


Mister G Kids

A daily comic about real stuff little kids say in school. By Matt Gajdoš

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm's Vintage Computing, Tech & Scale RC Blog

Random mutterings on retro computing, old technology, some new, plus radio controlled scale modelling.

ancientelectronics

retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way

lazygamereviews

MS-DOS game reviews, retro ramblings and more...

%d bloggers like this: