Posts Tagged ‘java

27
Oct
14

A Guide to Memory/Object Pooling

I was recently listening to a vlog by a games developer who wanted a new programming language specifically for games, and one of his big gripes with C++ was the memory allocation process. He had a number of arguments, none of which resonated with me, but it did spark interest for me in the memory allocation/de-allocation problem. I’ll briefly discuss the problem from the perspective of a language like C (in this case Pascal) and one that uses garbage collection (in this case Java).

Most languages dynamically allocate and free memory in very much the same way as even very old languages such as Pascal did. You have to explicitly ask the API to allocate memory for you, and then explicitly free it when you are finished with it. In an OO language like C++ this results in the need for destructors to free the memory the object uses when it is destroyed.

Either way memory is allocated and freed, and if it is done on a regular basis this can lead to something called heap fragmentation. Heap fragmentation is not quite the same as disk fragmentation in that an allocated block will not be split up. Pascal for instance allocates memory one block after another, simply creating the new one at the end of the old one. However when a block of memory is freed it leaves a hole, these holes are effectively heap fragmentation.

Pascal deals with these holes by maintaining a free list, which is a linked list containing all the free blocks available. When an application allocates more memory the list is traversed to find the first available memory big enough for the new allocation, hopefully filling the hole but possibly leaving a small amount still available. Clearly the holes will likely get smaller over time until they are unusable, and the more holes there are the longer any memory allocation will take.

Other allocation schemes are available in other languages, but they have similar issues, take a look here to see what I mean. Whatever scheme you use there is significant cost the more you allocate and free memory.

Garbage collected languages such as Java have problems of their own. Whilst it is claimed that allocating memory in Java is practically free in terms of complexity, freeing memory is known to have performance issues as the garbage collector (GC for short) kicks in to clean up. The mechanics of the memory allocation mechanism are largely hidden in a language like Java, so that shouldn’t concern you much, but the performance impact a garbage collector can have should concern you.

So why is the GC a potential performance hazard? In order for the GC to know which memory to free it must work out which memory is still being used. One way of doing that is maintaining a reference count for each pointer/reference in use. Obviously any reference with a count of zero can be freed, but this leads to another problem, determining which collections of objects can be freed.

Imagine a data structure like a tree in which every node holds a reference not only to its children but its parent as well. If you were to use the tree, holding only a reference to the root node, then later released that reference, reference counting would not enable freeing the memory the tree uses. Even though no external objects can reach it. A GC therefore also needs to have an algorithm that will check the reachability of memory.

That’s a rather simplistic view of garbage collection, but it’s enough to see that running the GC can be expensive.

So how can you deal with these issues? It seems it doesn’t matter which kind of system/language you use, memory management has real implications for performance. One answer is to use a technique commonly called object or memory pooling.

A simple way to think of a memory pool is that it’s a lot like a cupboard. When you need something it’s quite quick to go to the cupboard and get something that’s available there, and when you’re done with it you return it. If the cupboard turned out to be empty, you’d have to go down to the shops in order to buy more of whatever you need. An advantage being you can stock your cupboard with resources you plan on using later.

Memory and object pools are effectively exactly the same thing. You can preallocate memory, use it, return it to the pool, then use it again. This reduces the amount of stress on the memory allocation system, leading to fewer GCs and less heap fragmentation. You’d be amazed how much difference something like this can make on a system, especially something like Android.

As an example, I implemented a path finding algorithm in Java for a maze game I am writing. I could have used a dynamically sized data stack that grows and shrinks as the algorithm functions. That would however trigger the GC much more frequently as is uses and releases memory on the data stack. It is much better to preallocate the maximum size of the stack once, and reuse the same memory every time the algorithm is used. This only allows one instance of the algorithm to run at once, but it can be adjusted to allow multi-threading by creating a shared pool that each instance can use.

So what kinds of objects or memory should you use pooling for? Ideally anything that is short-lived should have some kind of pooling scheme in place as they put the most stress on the memory allocation system. This would be objects like particles and bullets in a game engine. Long-lived and infrequently used objects that pose less of a problem to the system don’t really need to be pooled as they cause much less stress.

Pooling can also be used for other resources that are expensive to create and destroy such as database or network connections. This can help alleviate stress on other systems such as the network stack, database servers, and file systems. So they are really an import tool in any programmers toolbox to improve performance.

16
Jun
14

Recursive versus Iterative Algorithm Results Follow-up

A few week ago I did some preliminary investigation into the performance difference between recursive and iterative procedures/algorithms. The problem I chose to solve was path finding in an imperfect maze as there are algorithms which have the same theoretical complexity. I also already had some code lying around for both generating mazes and finding paths. I created iterative and recursive versions of the depth first search algorithm to compare with each other and included breadth first search for comparison to a known “slower” algorithm.

The first results I got were interesting, both the dfs (short for depth first search) algorithms were fairly close in terms of rates until the size of the mazes got larger. The results however were inconsistent because of the small number of mazes used for testing and differences in the behavior of the two dfs algorithms. Clearly I needed to test across a larger set of mazes and fix the behavior difference before I’d get good results. It wasn’t a waste of time as it did illustrate that it is possible for both types of algorithm to out-perform the other given even minor differences in behavior.

So I carefully adjusted the code so that both implementations of the dfs algorithms behaved in the same manner. Again I ran some initial tests and this time the iterative version of dfs out-performed the others in general. Satisfied that the code is behaving correctly I up-scaled the tests to 1000 different mazes across many sizes so I could graph the results.

alg-rate

You can see the rate for all the algorithms drops of very quickly! It is quite difficult to see much detail in this graph so I did some number crunching to determine the average time per search for each algorithm and size of maze.

alg-msec-per-search

This graph makes the difference much more clear. Note the parabola shape, this is because the worst case for all the algorithms is to search the entire maze, which obviously is the square of the X axis in this graph. The interesting part is how they compare with each other. The iterative dfs graph is very shallow compared to the others, it is in fact quite difficult to see much of an increase. It is however increasing in a similar fashion just not anywhere near as dramatic as both recursive dfs and iterative bfs (breadth first search).

So why the huge difference between iterative dfs and the others? The big difference between the two dfs algorithms is the cost of recursion. The extra cost is per square visited in the maze, so a doubling in the processing overhead per square produces a times four increase in the total overhead.

To get a good idea of why the overhead is so comparably large in the recursive case we need to think about how it is implemented at the machine code level. Basically for every square visited in the maze there is a function call to the recursive function and a return at the end. A function call and return involves a number of relatively expensive operations. Firstly information from the calling code needs to be pushed onto the stack (usually important register information), then the processor can “call” the function code which basically pushes a return address onto the stack and jumps to the function code. The first thing the newly called function must do is allocate space on the stack for its local variables, then it can do whatever processing it requires. Once finished the function will place the return value in either the stack or a register, remove local variables from the stack and then return control to the code that called it. This involves restoring the stack to remove the local variable information and then using the return address stored there to return to the calling code. The calling code can then retrieve anything it pushed on the stack before the call and continue operation. Quite the involved process!

Lets contrast that with what an iterative algorithm needs to do differently. Firstly any iterative algorithm needs to set up storage space for the body of the algorithm. In this case a stack (in the data segment or heap) for storing the current search and some variables for tracking the current location. Then the body can begin, usually this will be a loop of some kind, either a for loop or a while loop. These loops involve checking a condition and jumping at either the beginning or end, but are only short-range jumps within the function. The body of the code simply does the search updating the stack as it goes.

So for every square of the maze visited the recursive version has to deal with function call overhead (stack operations and far calls) where the iterative version has to only check a condition and make a short jump. There is more to it with the effects of the L1 and L2 caches, but I’ll leave that out because it’s x86 specific.

So the cost of recursion is high, in this case almost as bad as iterative breadth first search and it doesn’t guarantee the shortest path. Also because of limited program stack space, the recursive function cannot search as large a maze lest there be a stack overflow. So in general I’d recommend people stick with writing iterative code except where the complexity of a recursive function is significantly better. Your code should be faster and put less strain on the limited stack space.

21
May
14

Recursive and Iterative procedures

Today I’m looking at two different types of procedures, Recursive and Iterative. Many problems can be implemented using both techniques which have both benefits and drawbacks.

For those not acquainted with the terminology an iterative procedure accomplishes computation with a block of code that repeats many times until a terminating condition is met. The repeating block of code is commonly called a loop and can take many different forms. One of the main drawbacks of an iterative design is that it can become quite complicated making it harder to understand its function. However this is a trade-off to get higher performance and to keep the size of the program stack small.

Recursive procedures however are quite different as they infrequently make use of loops. Instead they achieve repetition by invoking itself. The Fibonacci function is a classic case. Fib(x) = Fib(x-1) + Fib(x-2) where the first two entries are hard coded to 1. It’s important that recursive functions have a basis case or a maximum depth to the number of calls otherwise an error called a stack overflow can occur. Basically a stack overflow is simply when the program stack literally overflows with too much data. The program stack has to store the local variables and return addresses for completed functions, recursive functions typically use up a lot of stack space. Another disadvantage is that function calls in many modern OO langauges are relatively expensive in terms of time required, as some languages require a pointer or two be dereferenced to access the function.

Fortunately all recursive procedures can be converted into iterative ones, the Fibonacci function again a classic and trivial case. The code might look something like this…

public int fib(int z)
{
  int[] series = new int[z];
  series[0] = 1;
  series[1] = 1;
  for (int i=2; i<z; i++)
    series[i] = series[i-1] + series[i-2];
  return series[z-1]; // end of the array contains the result.
}

Normally I’d say Iterative functions are usually faster, and in the case of the Fibonacci function that can be demonstrated, but with more complicated problems and algorithms it is much harder to prove. It’s simpler to construct the code and measure the performance! So to get an idea of the impact of function call overhead I decided to implement a problem in Java using a recursive and iterative algorithm.

The problem I have chosen is finding a path within a maze. I chose it because I already had some maze generation code and the depth first search algorithm should have similar performance when implemented both ways. The maze generated is supposed to have loops in it (ie it is not perfect) so I had to code the two procedures carefully to ensure they would pick the same path. I included a breadth first search algorithm (implemented iteratively) as well for comparison, it is guaranteed to find the shortest path where as both the depth first search implementations are not.

The testing procedure involved generating a maze and determining the rate at which each algorithm could compute paths in the maze. The start and end points were set randomly but the random number generator seed was reset to be the same at the start of each algorithms run. This means that they should have to compute the same paths. I set the program to repeat this process 10 times to get some results for today, but I have a more powerful machine repeating the process 1000 times, just I have to wait several more days for the results.

In implementing depth first search in both techniques there was a minor difference in the algorithm that I couldn’t avoid at the time. The recursive algorithm is smarter than the iterative one in a subtle way. When moving back up the tree after hitting a dead end the recursive algorithm intrinsically remembers which directions it has tried at nodes it returns to. The iterative algorithm isn’t as clever, it checks all the directions again (in the same order). This should make the iterative algorithm slower as it does more processing when moving back up the maze after hitting a dead end. How much slower? I don’t know, but hopefully not much.

This slideshow requires JavaScript.

The results are interesting but unfortunately not definitive. You can see that the two depth first search algorithms are fairly close in terms of rates, although the recursive algorithm tends to be faster until the maze size gets larger. I have tried it a number of times to see how consistent the results are. I found that the breadth first search generally gives fairly consistent results, but the two depth first search algorithms can vary depending on the mazes generated. More often the recursive algorithm is faster but in some data sets the iterative one is faster.

I did some other small scale tests and it seems the processor has an effect on the results. I’m guessing the JIT compiler has different optimisations for different processors which is changing the results. It is entirely possible that the JIT is optimising out the expense in using recursive calls although I think this unlikely.

This leads me to believe I need more data, so I’ll wait until I get results from the machine testing 1000 different mazes. I did these smaller tests on my Macbook and was unable to test with mazes larger than 512×512 because of a stack overflow. The machine I’m doing the larger tests on is managing to run at a maze size of 1024×1024 because it has a different java run-time. (running under Linux) I might have a go at making the Iterative depth first search function more like the recursive function in terms of smarts but we’ll see if the larger tests reveal anything more.

08
Oct
13

Multi-threading Comparison between Python and Java

I find myself unprepared this week for what I was going to write about, partly because of playing Dwarf Fortress after having had a long respite, and having been away for a good portion of the weekend. So I don’t have any screen captures done for the game I was planning on writing about.

Instead, I’ll be talking today about multi-threading on both Java and Python and why in general Java is better at it.

The story begins with a project I had at work, where I was asked to speed up a program written in python. The code was written in a single thread and I was told that it was a sequential type of algorithm that the coder thought couldn’t be multi-threaded. This sounded like a challenge to me so I got a copy of the code to look at and began optimising.

The code was designed to solve a problem within an area of mathematics, something related to topological groups. I didn’t really understand the complete idea behind what he was trying to do, after all the person that wrote the code was an honours student and being a engineer I didn’t do anything harder than second year maths, and not in this field. The basic gist is that he was trying to work out the rate of usual growth and geodesic growth of a particular word problem and if there is a pattern to the growth rates. If you understand this you probably know more than I do!

So I had some code and began investigating. I couldn’t see any obvious way to make a huge improvement to his algorithm, which isn’t surprising, but I did find some areas of implementation that I could fix to make the program faster.

The code used string variables, it used them _a lot_. After profiling the code I found that it spent a good portion of its time doing the string length function (len). Changing the code to use this function less (by keeping track of the length in a variable) I managed to speed up the code by quite a bit. This is because measuring the length of a python string is an order N operation, as the length isn’t stored but computed each time by traversing the string. This is pretty much the same as is done in many languages such as C.

I found a few other implementation details relating to strings, but quickly ran into a wall of not being able to make the program much faster. So my attention turned to changing the code to take advantage of multiple cores. Now having become more acquainted with the code I noted that there were situations where one function was called many times on many strings in a data set. This data set unfortunately grows rather quickly (exponential I think) so later iterations of the algorithm could take a long time single threaded. So I decided to go about processing more than one string at the same time in many threads.

This unfortunately led to me find a problem with using python for any kind of number crunching. After doing some research I found out about something called the Global Interpreter Lock. This lock basically means that within one instance of Python the interpreter will only be active on one thread. Because the code runs almost exclusively in the interpreter it wasn’t really going to be possible to do all the work within one instance of python. The threading model in python works well for stuff like I/O where a thread may not need the interpreter for a while and it can be used by someone else.

So being a Unix user I thought of using multiple instances of python and using OS pipes to communicate between all the different instances. This is called multi-processing rather than multi-threading, but it still allowed me to make use of many more processors and in theory it could have even been extended to multiple machines, but as it turned out this wasn’t necessary and wouldn’t have helped.

The problem I faced after much testing was that the main python process was spending way more time on I/O than pretty much anything else. Part of the problem being the large volume of data that needed to traverse the pipes in order for each thread to run. I tried several different techniques but couldn’t get an Improvement, this is when I decided a port to Java would be a good idea.

The port to Java was pretty straight forward for the code that makes up the main part of the algorithm, so I took the opportunity to do a simple test as to which one has better single threaded performance. I ran the python version of the code using pipes to both java and python processing instances in turn. I found with equivalent code that Java was about 50% faster just for the function I had put into the separate process, this looked promising and was useful in debugging some of the java code. Given that python is interpreted this shouldn’t be surprising.

I moved to java because of the threading model being much better. Firstly it actually supports having more than one VM thread running at a time, so it makes it possible to actually do some processing on multiple threads, and secondly it has a bunch of handy mechanisms to help allow you to get the maximum processing done. Synchronized blocks, locks, and the notify/wait system allows for thread safe programming whilst not having to poll for events, thus saving processing time. The main advantage over python in this case is not having to do the I/O to get data to processing threads, and being able to multi-thread more sections of the code.

Now having completed the java port the performance stats speak for themselves, it is significantly faster than the python implementation, taking about 4 days to do what would take weeks even with the multi-process python implementation. I do have to note that I was using the 2.7 version of python, so this does not reflect the performance or programming model in the new python 3 implementation.

20
Aug
13

Android and Singleton Classes

Recently in my Android programming escapades I found something interesting and bothersome in an API I had written. I had created a singleton class for the main renderer of my graphics API. With the first app I was making I had no problem, and it in fact was quite helpful and easy to write, after all each app using this API would only have access to one renderer.

The problem arose out of some strangeness with the Android VM. It turns out that static fields of a class can but not necessarily cross from one app to another, effectively sharing one singleton across more than one activity. This might not sound too bad but it can be a problem if you use an API in multiple apps and they happen to run on the same device. This can result in undesirable effects.

So I removed the singleton property from the renderer object, and thus had to pass it around in method parameters like a cheap whore. It’s not an ideal solution but it worked, and if you believe some people on the internet it’s probably a better design. Whilst there was minimal impact on the API, an app I was writing needed to be hacked a little to make it work properly. The funny thing is if there was a separate JVM or classloader for every instance of all apps it would have been fine to have the renderer as a singleton and it would work.

So why are singletons a problem on Android? It seems that there is a single JVM running all the Android apps together. This means that static variables (including singleton classes) can end up hanging around long after an activity ends. If a class is shared with another app they can end up using the same singleton class. The other half of the problem is the Android OS will sometimes destroy an app or use a different class loader for a new activity thus making the static/singleton unreliable for any kind of storage or communications across activities or apps.

So it’s best to avoid the use of singletons on Android.

There seems to have been some debate about singletons and their use. Whilst it is true that many programmers over use the pattern and they can be a problem, they do have a place in a developers toolbox. There are common interfaces such as OpenGL and others which cannot be accessed in an easily nice OO way.

OpenGL in particular is a good example, it is not thread safe and basically requires all rendering be done in a single thread. It also is not Object oriented and as such there is only one instance of it per application at best. Because of its nature it is essentially an interface to the hardware acceleration of a machine, which is better suited to a singleton/procedural style interface.

The thing with OO design is it isn’t suited 100% to all programming problems. There are in fact many instances where using an OO design is a bad idea, such as embedded systems and applications that require high performance. The trouble is all the OO loveliness comes at a performance and memory size cost, that luckily isn’t a problem for most applications. This leads to the situation where something OO needs to interface to something that isn’t, this is where singletons are a useful construct.

Something my Dad always said is “a bad tradesmen blames the tools”. It seems there are many people out there blaming the singleton as if it is at fault for being used incorrectly. Sure there may be many cases of people using it poorly, but that just means they are poor designers/coders. The singleton pattern is just a tool, used appropriately it can make your coding life easier, used badly it can make your life harder. This is true for all the basic design patterns, it’s up to the designers and coders how good their designs and code are.

23
Jun
13

The Day the Earth got Mooned

The Day the Earth got Mooned is a game recently made by a youtuber that I frequently watch called JimPlaysGames, a British guy called Jim Baker. Jim has previously also made a game called Shitty Quest which was an interesting and funny adventure game with themes of what it is like to be a software developer fighting his own laziness. I played Shitty Quest and quite enjoyed the humour.

Attack!

Attack!

Jim clearly won his battle with laziness, as his new game is quite impressive. You play as the sole alien invader coming to attack Earth for no reason. Earths military forces have gathered to defend themselves, and you goal is to get to the moon, destroy the Earth forces, and deploy a “SECRET WEAPON”. The story is masterfully narrated by Jim himself, who adds some subtle humour to how the story is presented.

People who helped...

People who helped…

The graphics and art style are very simple and elegant. The sprites are nicely detailed and easy to identify, which can be important as the different enemies have very different behaviours. There is a nice particle engine which adds lovely bits of smoke and explosions to destruction to missiles and weapons. The sound is again well crafted, in particular the music has a nice pseudo retro feel that complements the graphic style and type of game.

Menu Screen

Menu Screen

There is a variety of different weapons which each have their own use, and may be upgraded during play. You can upgrade or build new weapons at any time throughout the game by going to the build screen by pressing space. You can also recharge your shields and build a spare ship which can make the game a bit easier than it would otherwise be, but given how hard it can get this is a good thing. I found the anti-matter spray could cause some slow down of the game when fully upgraded, but this usually only happens when the screen is really busy.

Weapons

Weapons

The gameplay is like an older retro shooter but different in many important aspects. You can aim your weapons so you can make better use of them without putting your ship in danger. The AI for the enemies actually aims at you with intent to destroy as opposed to the blizzard of bullets used by many of the older games. The control scheme has you use both the mouse and keyboard, and I found it was accurate when shooting and allows you to dodge the enemy effectively.

The Day the Earth got Mooned is a solid shooter with plenty of challenge. The only criticism I really have of it (and Shitty Quest as well) is that it is too short. You can complete the game in as little as half an hour to an hour. I understand that being a sole developer, Jim may not have had the time to really make the game longer and may have  spent time polishing the game as opposed to lengthening it. The game is multi-platform as it is written in Java (using LidGDX) so it will run on pretty much anything including Linux, Windows and Mac OSX. The game is free, so there’s little reason not to give it a go, you can find it on his website www.jimmakesgames.com.

21
Jan
13

Code Optimisation

Recently I started reading a book about code optimisation that focuses on the x86 architecture and assembly language. The book is called Graphics Programming Black book and is written by Michael Abrash who is well known for his work writing technical articles and coding on games such as Doom and Quake. Whilst the book has a graphics focus, the skills you can learn from the book can be applied to any high level language or assembly language you may happen to be using. I’ve dabbled with assembly code in the past and have found some parts of the book quite interesting. You can find the book on-line to read via html or PDF download. You can find it pretty easily if you just google Michael Abrash and the name of the book.

After my recent testing of data structure performance and beginning to read this book, I decided that I should look over all the Java code I had in order to try and optimise it. I have been for some time working on writing some android games, and in the process have built for myself a nice graphics library and the basics of some games. In testing and tuning my code I’ve found some good rules of thumb for making fast code in Java not just for the android platform but in general.

The first thing I’d recommend you do is to not use Iterator classes anywhere in your code, but use for loops instead. Iterators are objects and add overhead to your applications in multiple ways. Firstly the occupy heap space and need to be garbage collected when finished with. Now this may not seem all that bad, but if you have several nested loops, this could mean thousands of Iterator objects being created and destroyed on a regular basis in your code. Secondly it adds extra method call overhead every time you call hasNext() or next(), slowing down your loops. Again for small numbers of loops it seems insignificant, but when added up over a large number of loops the time taken to make the calls can become quite significant. For these reasons, even when using the API data structures such as ArrayList, I would avoid using Iterators.

In very time critical code, it is best to build your own data structures out of primitive arrays rather than to rely on the API code. The JIT compiler seems quite adept and optimising simple array based code to run a lot faster than pretty much any API or object based solution. It’s also best to try and minimise the number of method calls in your important loops to keep overhead down.

An interesting problem I recently had was to do with path finding within a maze in one of the games I am writing. Obviously I had a randomly generated maze that whilst being perfect to start with, has cycles (or loops) introduced to it to make the game more playable. When I first wrote the path finder I used a depth first approach that works quite well on perfect mazes as there is only one path from each point to each other point. The problem I had with it was two fold, the algorithm was slow in the larger mazes and with cycles being introduced it often chose longer paths than it had to. This perplexed me no end, so I went in search of what the problem might be, optimising some of the code to get better speed, and writing some extra code to try and get a better path. After having tried without any luck I decided to try using the breadth first search algorithm implemented iteratively. It turned out to be a whole bunch faster! I investigated into why this might happen given that depth first search should be faster to find a path (even if it’s not optimal). Upon comparing the code of the two algorithms I came to a simple conclusion. The depth first search code took a lot longer to execute a single iteration because it had many more method calls and comparisons than the simple breadth first search does. So whilst on paper the depth first search looked good, its performance is poor in comparison, because the simpler breadth first search could get through more iterations in the same amount of time. The breadth first search is so much better I was able to generate larger mazes and still find paths within a reasonable response time, even with worst case paths that searched most of the maze!

It’s interesting how sometimes the unexpected solution turns out to be the best. It shows you should always test your code, and spend a bit of time optimising your critical code. It will in the end make the experience of the end user all that much better, and enable you to do more with less.




Blogs I Follow

Enter your email address to follow this blog and receive notifications of new posts by email.


Mister G Kids

A daily comic about real stuff little kids say in school. By Matt Gajdoš

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm's Vintage Computing, Tech & Scale RC Blog

Random mutterings on retro computing, old technology, some new, plus radio controlled scale modelling.

ancientelectronics

retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way

lazygamereviews

MS-DOS game reviews, retro ramblings and more...