Archive for the 'Java' Category

27
Oct
14

A Guide to Memory/Object Pooling

I was recently listening to a vlog by a games developer who wanted a new programming language specifically for games, and one of his big gripes with C++ was the memory allocation process. He had a number of arguments, none of which resonated with me, but it did spark interest for me in the memory allocation/de-allocation problem. I’ll briefly discuss the problem from the perspective of a language like C (in this case Pascal) and one that uses garbage collection (in this case Java).

Most languages dynamically allocate and free memory in very much the same way as even very old languages such as Pascal did. You have to explicitly ask the API to allocate memory for you, and then explicitly free it when you are finished with it. In an OO language like C++ this results in the need for destructors to free the memory the object uses when it is destroyed.

Either way memory is allocated and freed, and if it is done on a regular basis this can lead to something called heap fragmentation. Heap fragmentation is not quite the same as disk fragmentation in that an allocated block will not be split up. Pascal for instance allocates memory one block after another, simply creating the new one at the end of the old one. However when a block of memory is freed it leaves a hole, these holes are effectively heap fragmentation.

Pascal deals with these holes by maintaining a free list, which is a linked list containing all the free blocks available. When an application allocates more memory the list is traversed to find the first available memory big enough for the new allocation, hopefully filling the hole but possibly leaving a small amount still available. Clearly the holes will likely get smaller over time until they are unusable, and the more holes there are the longer any memory allocation will take.

Other allocation schemes are available in other languages, but they have similar issues, take a look here to see what I mean. Whatever scheme you use there is significant cost the more you allocate and free memory.

Garbage collected languages such as Java have problems of their own. Whilst it is claimed that allocating memory in Java is practically free in terms of complexity, freeing memory is known to have performance issues as the garbage collector (GC for short) kicks in to clean up. The mechanics of the memory allocation mechanism are largely hidden in a language like Java, so that shouldn’t concern you much, but the performance impact a garbage collector can have should concern you.

So why is the GC a potential performance hazard? In order for the GC to know which memory to free it must work out which memory is still being used. One way of doing that is maintaining a reference count for each pointer/reference in use. Obviously any reference with a count of zero can be freed, but this leads to another problem, determining which collections of objects can be freed.

Imagine a data structure like a tree in which every node holds a reference not only to its children but its parent as well. If you were to use the tree, holding only a reference to the root node, then later released that reference, reference counting would not enable freeing the memory the tree uses. Even though no external objects can reach it. A GC therefore also needs to have an algorithm that will check the reachability of memory.

That’s a rather simplistic view of garbage collection, but it’s enough to see that running the GC can be expensive.

So how can you deal with these issues? It seems it doesn’t matter which kind of system/language you use, memory management has real implications for performance. One answer is to use a technique commonly called object or memory pooling.

A simple way to think of a memory pool is that it’s a lot like a cupboard. When you need something it’s quite quick to go to the cupboard and get something that’s available there, and when you’re done with it you return it. If the cupboard turned out to be empty, you’d have to go down to the shops in order to buy more of whatever you need. An advantage being you can stock your cupboard with resources you plan on using later.

Memory and object pools are effectively exactly the same thing. You can preallocate memory, use it, return it to the pool, then use it again. This reduces the amount of stress on the memory allocation system, leading to fewer GCs and less heap fragmentation. You’d be amazed how much difference something like this can make on a system, especially something like Android.

As an example, I implemented a path finding algorithm in Java for a maze game I am writing. I could have used a dynamically sized data stack that grows and shrinks as the algorithm functions. That would however trigger the GC much more frequently as is uses and releases memory on the data stack. It is much better to preallocate the maximum size of the stack once, and reuse the same memory every time the algorithm is used. This only allows one instance of the algorithm to run at once, but it can be adjusted to allow multi-threading by creating a shared pool that each instance can use.

So what kinds of objects or memory should you use pooling for? Ideally anything that is short-lived should have some kind of pooling scheme in place as they put the most stress on the memory allocation system. This would be objects like particles and bullets in a game engine. Long-lived and infrequently used objects that pose less of a problem to the system don’t really need to be pooled as they cause much less stress.

Pooling can also be used for other resources that are expensive to create and destroy such as database or network connections. This can help alleviate stress on other systems such as the network stack, database servers, and file systems. So they are really an import tool in any programmers toolbox to improve performance.

16
Jun
14

Recursive versus Iterative Algorithm Results Follow-up

A few week ago I did some preliminary investigation into the performance difference between recursive and iterative procedures/algorithms. The problem I chose to solve was path finding in an imperfect maze as there are algorithms which have the same theoretical complexity. I also already had some code lying around for both generating mazes and finding paths. I created iterative and recursive versions of the depth first search algorithm to compare with each other and included breadth first search for comparison to a known “slower” algorithm.

The first results I got were interesting, both the dfs (short for depth first search) algorithms were fairly close in terms of rates until the size of the mazes got larger. The results however were inconsistent because of the small number of mazes used for testing and differences in the behavior of the two dfs algorithms. Clearly I needed to test across a larger set of mazes and fix the behavior difference before I’d get good results. It wasn’t a waste of time as it did illustrate that it is possible for both types of algorithm to out-perform the other given even minor differences in behavior.

So I carefully adjusted the code so that both implementations of the dfs algorithms behaved in the same manner. Again I ran some initial tests and this time the iterative version of dfs out-performed the others in general. Satisfied that the code is behaving correctly I up-scaled the tests to 1000 different mazes across many sizes so I could graph the results.

alg-rate

You can see the rate for all the algorithms drops of very quickly! It is quite difficult to see much detail in this graph so I did some number crunching to determine the average time per search for each algorithm and size of maze.

alg-msec-per-search

This graph makes the difference much more clear. Note the parabola shape, this is because the worst case for all the algorithms is to search the entire maze, which obviously is the square of the X axis in this graph. The interesting part is how they compare with each other. The iterative dfs graph is very shallow compared to the others, it is in fact quite difficult to see much of an increase. It is however increasing in a similar fashion just not anywhere near as dramatic as both recursive dfs and iterative bfs (breadth first search).

So why the huge difference between iterative dfs and the others? The big difference between the two dfs algorithms is the cost of recursion. The extra cost is per square visited in the maze, so a doubling in the processing overhead per square produces a times four increase in the total overhead.

To get a good idea of why the overhead is so comparably large in the recursive case we need to think about how it is implemented at the machine code level. Basically for every square visited in the maze there is a function call to the recursive function and a return at the end. A function call and return involves a number of relatively expensive operations. Firstly information from the calling code needs to be pushed onto the stack (usually important register information), then the processor can “call” the function code which basically pushes a return address onto the stack and jumps to the function code. The first thing the newly called function must do is allocate space on the stack for its local variables, then it can do whatever processing it requires. Once finished the function will place the return value in either the stack or a register, remove local variables from the stack and then return control to the code that called it. This involves restoring the stack to remove the local variable information and then using the return address stored there to return to the calling code. The calling code can then retrieve anything it pushed on the stack before the call and continue operation. Quite the involved process!

Lets contrast that with what an iterative algorithm needs to do differently. Firstly any iterative algorithm needs to set up storage space for the body of the algorithm. In this case a stack (in the data segment or heap) for storing the current search and some variables for tracking the current location. Then the body can begin, usually this will be a loop of some kind, either a for loop or a while loop. These loops involve checking a condition and jumping at either the beginning or end, but are only short-range jumps within the function. The body of the code simply does the search updating the stack as it goes.

So for every square of the maze visited the recursive version has to deal with function call overhead (stack operations and far calls) where the iterative version has to only check a condition and make a short jump. There is more to it with the effects of the L1 and L2 caches, but I’ll leave that out because it’s x86 specific.

So the cost of recursion is high, in this case almost as bad as iterative breadth first search and it doesn’t guarantee the shortest path. Also because of limited program stack space, the recursive function cannot search as large a maze lest there be a stack overflow. So in general I’d recommend people stick with writing iterative code except where the complexity of a recursive function is significantly better. Your code should be faster and put less strain on the limited stack space.

21
May
14

Recursive and Iterative procedures

Today I’m looking at two different types of procedures, Recursive and Iterative. Many problems can be implemented using both techniques which have both benefits and drawbacks.

For those not acquainted with the terminology an iterative procedure accomplishes computation with a block of code that repeats many times until a terminating condition is met. The repeating block of code is commonly called a loop and can take many different forms. One of the main drawbacks of an iterative design is that it can become quite complicated making it harder to understand its function. However this is a trade-off to get higher performance and to keep the size of the program stack small.

Recursive procedures however are quite different as they infrequently make use of loops. Instead they achieve repetition by invoking itself. The Fibonacci function is a classic case. Fib(x) = Fib(x-1) + Fib(x-2) where the first two entries are hard coded to 1. It’s important that recursive functions have a basis case or a maximum depth to the number of calls otherwise an error called a stack overflow can occur. Basically a stack overflow is simply when the program stack literally overflows with too much data. The program stack has to store the local variables and return addresses for completed functions, recursive functions typically use up a lot of stack space. Another disadvantage is that function calls in many modern OO langauges are relatively expensive in terms of time required, as some languages require a pointer or two be dereferenced to access the function.

Fortunately all recursive procedures can be converted into iterative ones, the Fibonacci function again a classic and trivial case. The code might look something like this…

public int fib(int z)
{
  int[] series = new int[z];
  series[0] = 1;
  series[1] = 1;
  for (int i=2; i<z; i++)
    series[i] = series[i-1] + series[i-2];
  return series[z-1]; // end of the array contains the result.
}

Normally I’d say Iterative functions are usually faster, and in the case of the Fibonacci function that can be demonstrated, but with more complicated problems and algorithms it is much harder to prove. It’s simpler to construct the code and measure the performance! So to get an idea of the impact of function call overhead I decided to implement a problem in Java using a recursive and iterative algorithm.

The problem I have chosen is finding a path within a maze. I chose it because I already had some maze generation code and the depth first search algorithm should have similar performance when implemented both ways. The maze generated is supposed to have loops in it (ie it is not perfect) so I had to code the two procedures carefully to ensure they would pick the same path. I included a breadth first search algorithm (implemented iteratively) as well for comparison, it is guaranteed to find the shortest path where as both the depth first search implementations are not.

The testing procedure involved generating a maze and determining the rate at which each algorithm could compute paths in the maze. The start and end points were set randomly but the random number generator seed was reset to be the same at the start of each algorithms run. This means that they should have to compute the same paths. I set the program to repeat this process 10 times to get some results for today, but I have a more powerful machine repeating the process 1000 times, just I have to wait several more days for the results.

In implementing depth first search in both techniques there was a minor difference in the algorithm that I couldn’t avoid at the time. The recursive algorithm is smarter than the iterative one in a subtle way. When moving back up the tree after hitting a dead end the recursive algorithm intrinsically remembers which directions it has tried at nodes it returns to. The iterative algorithm isn’t as clever, it checks all the directions again (in the same order). This should make the iterative algorithm slower as it does more processing when moving back up the maze after hitting a dead end. How much slower? I don’t know, but hopefully not much.

This slideshow requires JavaScript.

The results are interesting but unfortunately not definitive. You can see that the two depth first search algorithms are fairly close in terms of rates, although the recursive algorithm tends to be faster until the maze size gets larger. I have tried it a number of times to see how consistent the results are. I found that the breadth first search generally gives fairly consistent results, but the two depth first search algorithms can vary depending on the mazes generated. More often the recursive algorithm is faster but in some data sets the iterative one is faster.

I did some other small scale tests and it seems the processor has an effect on the results. I’m guessing the JIT compiler has different optimisations for different processors which is changing the results. It is entirely possible that the JIT is optimising out the expense in using recursive calls although I think this unlikely.

This leads me to believe I need more data, so I’ll wait until I get results from the machine testing 1000 different mazes. I did these smaller tests on my Macbook and was unable to test with mazes larger than 512×512 because of a stack overflow. The machine I’m doing the larger tests on is managing to run at a maze size of 1024×1024 because it has a different java run-time. (running under Linux) I might have a go at making the Iterative depth first search function more like the recursive function in terms of smarts but we’ll see if the larger tests reveal anything more.

08
Oct
13

Multi-threading Comparison between Python and Java

I find myself unprepared this week for what I was going to write about, partly because of playing Dwarf Fortress after having had a long respite, and having been away for a good portion of the weekend. So I don’t have any screen captures done for the game I was planning on writing about.

Instead, I’ll be talking today about multi-threading on both Java and Python and why in general Java is better at it.

The story begins with a project I had at work, where I was asked to speed up a program written in python. The code was written in a single thread and I was told that it was a sequential type of algorithm that the coder thought couldn’t be multi-threaded. This sounded like a challenge to me so I got a copy of the code to look at and began optimising.

The code was designed to solve a problem within an area of mathematics, something related to topological groups. I didn’t really understand the complete idea behind what he was trying to do, after all the person that wrote the code was an honours student and being a engineer I didn’t do anything harder than second year maths, and not in this field. The basic gist is that he was trying to work out the rate of usual growth and geodesic growth of a particular word problem and if there is a pattern to the growth rates. If you understand this you probably know more than I do!

So I had some code and began investigating. I couldn’t see any obvious way to make a huge improvement to his algorithm, which isn’t surprising, but I did find some areas of implementation that I could fix to make the program faster.

The code used string variables, it used them _a lot_. After profiling the code I found that it spent a good portion of its time doing the string length function (len). Changing the code to use this function less (by keeping track of the length in a variable) I managed to speed up the code by quite a bit. This is because measuring the length of a python string is an order N operation, as the length isn’t stored but computed each time by traversing the string. This is pretty much the same as is done in many languages such as C.

I found a few other implementation details relating to strings, but quickly ran into a wall of not being able to make the program much faster. So my attention turned to changing the code to take advantage of multiple cores. Now having become more acquainted with the code I noted that there were situations where one function was called many times on many strings in a data set. This data set unfortunately grows rather quickly (exponential I think) so later iterations of the algorithm could take a long time single threaded. So I decided to go about processing more than one string at the same time in many threads.

This unfortunately led to me find a problem with using python for any kind of number crunching. After doing some research I found out about something called the Global Interpreter Lock. This lock basically means that within one instance of Python the interpreter will only be active on one thread. Because the code runs almost exclusively in the interpreter it wasn’t really going to be possible to do all the work within one instance of python. The threading model in python works well for stuff like I/O where a thread may not need the interpreter for a while and it can be used by someone else.

So being a Unix user I thought of using multiple instances of python and using OS pipes to communicate between all the different instances. This is called multi-processing rather than multi-threading, but it still allowed me to make use of many more processors and in theory it could have even been extended to multiple machines, but as it turned out this wasn’t necessary and wouldn’t have helped.

The problem I faced after much testing was that the main python process was spending way more time on I/O than pretty much anything else. Part of the problem being the large volume of data that needed to traverse the pipes in order for each thread to run. I tried several different techniques but couldn’t get an Improvement, this is when I decided a port to Java would be a good idea.

The port to Java was pretty straight forward for the code that makes up the main part of the algorithm, so I took the opportunity to do a simple test as to which one has better single threaded performance. I ran the python version of the code using pipes to both java and python processing instances in turn. I found with equivalent code that Java was about 50% faster just for the function I had put into the separate process, this looked promising and was useful in debugging some of the java code. Given that python is interpreted this shouldn’t be surprising.

I moved to java because of the threading model being much better. Firstly it actually supports having more than one VM thread running at a time, so it makes it possible to actually do some processing on multiple threads, and secondly it has a bunch of handy mechanisms to help allow you to get the maximum processing done. Synchronized blocks, locks, and the notify/wait system allows for thread safe programming whilst not having to poll for events, thus saving processing time. The main advantage over python in this case is not having to do the I/O to get data to processing threads, and being able to multi-thread more sections of the code.

Now having completed the java port the performance stats speak for themselves, it is significantly faster than the python implementation, taking about 4 days to do what would take weeks even with the multi-process python implementation. I do have to note that I was using the 2.7 version of python, so this does not reflect the performance or programming model in the new python 3 implementation.

21
Jan
13

Code Optimisation

Recently I started reading a book about code optimisation that focuses on the x86 architecture and assembly language. The book is called Graphics Programming Black book and is written by Michael Abrash who is well known for his work writing technical articles and coding on games such as Doom and Quake. Whilst the book has a graphics focus, the skills you can learn from the book can be applied to any high level language or assembly language you may happen to be using. I’ve dabbled with assembly code in the past and have found some parts of the book quite interesting. You can find the book on-line to read via html or PDF download. You can find it pretty easily if you just google Michael Abrash and the name of the book.

After my recent testing of data structure performance and beginning to read this book, I decided that I should look over all the Java code I had in order to try and optimise it. I have been for some time working on writing some android games, and in the process have built for myself a nice graphics library and the basics of some games. In testing and tuning my code I’ve found some good rules of thumb for making fast code in Java not just for the android platform but in general.

The first thing I’d recommend you do is to not use Iterator classes anywhere in your code, but use for loops instead. Iterators are objects and add overhead to your applications in multiple ways. Firstly the occupy heap space and need to be garbage collected when finished with. Now this may not seem all that bad, but if you have several nested loops, this could mean thousands of Iterator objects being created and destroyed on a regular basis in your code. Secondly it adds extra method call overhead every time you call hasNext() or next(), slowing down your loops. Again for small numbers of loops it seems insignificant, but when added up over a large number of loops the time taken to make the calls can become quite significant. For these reasons, even when using the API data structures such as ArrayList, I would avoid using Iterators.

In very time critical code, it is best to build your own data structures out of primitive arrays rather than to rely on the API code. The JIT compiler seems quite adept and optimising simple array based code to run a lot faster than pretty much any API or object based solution. It’s also best to try and minimise the number of method calls in your important loops to keep overhead down.

An interesting problem I recently had was to do with path finding within a maze in one of the games I am writing. Obviously I had a randomly generated maze that whilst being perfect to start with, has cycles (or loops) introduced to it to make the game more playable. When I first wrote the path finder I used a depth first approach that works quite well on perfect mazes as there is only one path from each point to each other point. The problem I had with it was two fold, the algorithm was slow in the larger mazes and with cycles being introduced it often chose longer paths than it had to. This perplexed me no end, so I went in search of what the problem might be, optimising some of the code to get better speed, and writing some extra code to try and get a better path. After having tried without any luck I decided to try using the breadth first search algorithm implemented iteratively. It turned out to be a whole bunch faster! I investigated into why this might happen given that depth first search should be faster to find a path (even if it’s not optimal). Upon comparing the code of the two algorithms I came to a simple conclusion. The depth first search code took a lot longer to execute a single iteration because it had many more method calls and comparisons than the simple breadth first search does. So whilst on paper the depth first search looked good, its performance is poor in comparison, because the simpler breadth first search could get through more iterations in the same amount of time. The breadth first search is so much better I was able to generate larger mazes and still find paths within a reasonable response time, even with worst case paths that searched most of the maze!

It’s interesting how sometimes the unexpected solution turns out to be the best. It shows you should always test your code, and spend a bit of time optimising your critical code. It will in the end make the experience of the end user all that much better, and enable you to do more with less.

06
Jan
13

Memory Allocation – Heap and Stack

A question I recently saw online was about how memory is allocated and used within the Java programming language. There was a little confusion about what the heap and stack are. Both the stack and heap are common to many languages such as pascal, C, and many more.

Stack space is a special construct in hardware used for storing temporary variables, return values and return addresses for functions. Most languages use the stack to store any local variables that will fall out of scope when a procedure (or method in OO language) finishes or returns. Java has the same style of stack construct for the virtual machine used for the same purpose. The Java language will however only store primitives (char, int, byte etc) and references to objects on the stack.

Heap space is a software construct and will be different depending on the language. It’s used for dynamically allocating memory usually for longer term storage of larger chunks of data. In languages like pascal and C, data on the heap is usually handled and referred to by a primitive called pointers. A pointer basically holds the memory location of whatever it is pointing to, they can be difficult to program.

Java doesn’t have a pointer primitive so it uses a different mechanism. Instead of having pointers, Java has references which are subtly different. A reference is different in that the memory location information is not available to the programmer where as it is with a pointer. This means you can’t do pointer arithmetic or try to access memory directly. The intent of this is to protect programmers from themselves and to make Java more secure. Java references are used for storing objects and arrays on the heap. The garbage collector keeps track of how many references are pointing to each item in the heap, and disposes of items that are no longer in use.

Older languages that target DOS and the older systems will allocate global variables to what is called the data segment. DOS and earlier processors used segmented memory addressing and in part comes from the 8086 machines assembly language. There are three types of segment, code, stack, and data. Code segments are for storing the machine code for the program and are pretty straight forward, there can be many of them but on DOS and 8086 machines the maximum size of each is 64Kb. The stack segment is singular and unique and is for the processor stack, there can be only one again with a maximum size of 64Kb. The data segment is as I said for statically defined data, and each segment is 64K in size. Borland Turbo Pascal only allowed one data segment and I suspect most other languages did the same. The heap would be outside the segments and would use the remaining memory, it was typically managed by a system library that was a part of the language used.

Java is fairly similar to native languages but does not use a segmented model, as many modern languages don’t. This is partly because hardware and software has changed in such a way as they are no longer needed.

19
Dec
12

Java Data Structure Performance

Recently I was browsing Stack Overflow looking at questions asked about Java. One theme that kept on recurring is that of the performance of various data structures within the Java API, in particular the Vector and ArrayList classes were frequently compared. So I decided that i would have a go at working out which data structures perform well.

The first thing I needed to decide upon was what kind of test was fair, and what kind of circumstances would I run the tests. I made a small test framework in Java so I could apply exactly the same algorithm to each data structure. I built the program to perform tests one after another with increasing data sizes to see how the data size affected the results. I wasn’t worried about factors such as the garbage collector and other tasks running on the OS as these would affect all the tests and I could take some precautions to minimise their effect. I’ll make my code available in the code section when I’ve finished tidying it up a bit.

I made code to test several different major aspects of a data structure. I recorded how long (in milliseconds) it took to create, insert into, remove an item by index, and remove an item by its value. I was just using the System.currentTimeMillis() to work out how much time was used, so small-scale accuracy can’t be assured. If you want to be more accurate you could use a profiler on the same code. I chose to not require the data be ordered within the collections as the tests are not dependant upon ordering within the data structure.

I tested 4 different data structures and had a dummy test with no data store to check the algorithm for the tester wasn’t causing extra overhead. I tested an implementation of a standard primitive array and LinkedList as well as the Vector and ArrayList classes. I’ve graphed some of the results I had at the bottom of this post. The first result that I found very interesting is with 1000 or fewer records there are no significant differences between the various data structure types. So if you aren’t storing large amounts of data there is little to gain by picking one data type over another.

The LinkedList data type interestingly had some of the worst performance values getting worse comparatively as the data size increased. Particularly for the remove by index and remove by value operations. In theory it should have similar performance to the others so this is a bit confusing. Perhaps it is all the extra reference handling that makes it significantly slower. It’s no surprise that it isn’t faster than the other data types as the operations that work well are adding and removing at either end of the list. This means that the LinkedList data type is best suited to implement structures such as a stack or queue as that allows for dynamically sized structures that are relatively fast.

Vector and ArrayList showed some slight difference in performance but are for the most part pretty similar. This is in part because they use exactly the same underlying data storage mechanism. The only difference being that Vector is synchronized and therefore thread safe. The results from testing seem to indicate that the cost for acquiring the synchronization lock is a smaller factor the larger the data set you have. Even on smaller data sizes the difference wasn’t incredibly large. On a faster machine with a different JVM the difference between the two was even smaller, looking almost identical. So differences between these classes will largely depend on what kind of system you are running on, and will in most cases not be a huge difference. Again if the data size is small it doesn’t really matter which one you use.

Interestingly the Data type that did the best was the standard primitive array implementation. It was significantly faster than all the other data structures! The simple answer is that primitive arrays in java require fewer operations to manage when converted into machine code by the JVM compared to other methods of storage. In this case it is also faster for removal by index because the array did not worry about keeping items in order when removing an item. I simply moved the item at the end to occupy the space to be deleted and then decremented the size. This is a common practise for lists of items in games where the order in the array is not important.

Classes such as Vector, ArrayList and LinkedList preserve the order of items within them whether you require it or not. This carries an intrinsic cost when removing items, and at worst can be O(n) to remove an item. When order of the items isn’t a concern any item can be removed in constant time as I have done in this implementation of the primitive array.

So which data storage technique should you favour? There isn’t a simple answer to this question as it will depend on each particular problem. However it does seem that you will get the best results most often when you create a data structure tailored to your problem. In this particular instance a simple array was best. You can construct pretty much any type of data structure with a primitive array, so many different solutions can be implemented with better performance than the classes in the API. In my experience the Vector class is fine for most things and easy to use, but when I’ve required speed I often use a primitive array, or another type of data structure altogether such as a HashMap which is useful for creating a cache.

So why do the data store classes in the API exist in the first place? Simply put convenience and generality, when performance or data size are not going to cause problems it’s easier to use a pre-constructed class to store data in. The java storage classes are also implemented in a generic way so that they can be used for any purpose, this has a performance cost but you benefit by having data structures you can use for many different data types and situations. If you want some of that convenience and better performance, you could use the Trove library for data storage. Trove is faster than the java API but does not really have generality like the classes in the API. Check it out here.

17
Jun
12

The Java Programming language

The first language I learned at university was Java. Java is a high level object-oriented language, it is used in web pages in the form of applets. Some features of Java were very unique at the time it was created. The compiler does not generate native code, but rather generates something called byte code. Byte code basically a special form of assembly language for a machine that does not exist, in essence a virtual machine. Of course this special code will not function on normal hardware, so special interpreter software is used to run it. Interpreters for many platforms exist, for windows, mac OS, and all the unix platforms. This in and of itself isn’t that remarkable as many BASIC interpreters also generate a binary format to represent instructions in a similar way. Java code is typically very portable and is in use across many platforms today. Many programs can run across multiple platforms without modification.

Java was the major language I learned for using object-oriented and I found that it was pretty easy to get the hang of. All the major important aspects of object-oriented languages are represented. Inheritance and abstraction are implemented through extending classes and implementing interfaces. Something that helped me learn the language easier was the simpler form memory management. You can hold something called a reference to an object, which is simply Java speak for having a pointer to an object. A reference is created by the constructor implicitly and can be used freely. When the reference is no longer needed you can let it go, and the Java garbage collector will free the object at an appropriate time. You can’t perform pointer arithmetic on references (or change them manually) which means that memory references can’t be tampered with, which protects Java programs from themselves.

The API is very well documented which has been very handy to me over the years. You can generate your own documentation using the javadoc tool which I would definitely recommend you do. The API is full of very useful classes, in particular the util, nio and swing packages will contain classes you will use on a regular basis. I’ve found many of the data storage classes in util are good. Especially the vector and hashmap classes.

Over the course of my university degree I have written many Java programs for assignments. The language made implementing and studying computer science algorithms, data structures and software designs much simpler. It also helped my coding style and documentation improve and become more readable. I still use Java in much of my programming today as I can write simpler code without worrying about some of the specifics of the platform I’m writing for. I still like to write code in native programming languages partly for the challenge and partly for speed.

13
May
12

Profiling an Android Application

As you may or may not know, I have been writing an app for the Android platform using OpenGL. Anyone writing for OpenGL on the Android platform should know that it is important to make sure your code is memory efficient, and most importantly fast. In order to achieve this it is important that you test and analyse your code on a regular basis. There are some useful tools that come with the Android SDK and some that can be downloaded with eclipse that you should use.

For profiling you application you should use Traceview. In order to make effective use of this tool you need to make a trace file covering the time frame that you are interested in. You can do this with eclipse, but the length of the trace is limited, and it can be inaccurate. I suggest that you make use of the android.os.Debug class to make the trace file, because you can control the size of the trace buffer.  I have used Traceview to analyse the lengths of time it takes to execute elements of my rendering code, and to monitor the changes in timings when I have changed my code.

For analysing how much memory you are using there are several things you should do. Firstly take a look at the output of logcat and see how often the garbage collector runs and how much memory it frees each time it does. If your app creates and releases objects on a regular basis, the garbage collector will spend a lot of time cleaning up memory which will slow your app down. The garbage collector will also tell you how much heap memory you are using. You need to be aware that there is limited heap space available to apps on Android phones. Some have a limit of 24Mb of heap, but it differs from device to device. So you need to be careful of the amount of RAM used by your app, even if the target device has a lot of RAM. A good way to find out how your app uses memory is to use the eclipse memory analysis tool. It is fairly simple to get working in eclipse, and like method tracing can also be activated by the Debug class. I found that I had to limit the results of all my searches to my project classes to get useful results. There are a number of other classes that belong to the Android API that take up a significant memory space in a small app. Either way the analysis tool is very good, but can be a little complex to use if you don’t know the terminology.

The tools have been useful to me in finding some areas of my code which could be improved, and quantified the improvement. I found a section where I had created a String object implicitly in a method call. When the method finished the String was no longer needed and the garbage collector would kick in once every two seconds. I changed my code to use a different data type and not create objects implicitly and got the relevant code to run twice as fast in addition to not requiring the garbage collector as often. I have also been able to determine the amount of memory my objects require and tune them appropriately.

The analysis tools are only half the solution. Your application needs to have a good solid design, to make it easier to diagnose and correct problems. Lastly it is important to know how to interpret the information from the tools. Many parts of the analysis tools cause slow down or extra memory use, so you need to be aware that the statistics presented are relative to each other and don’t always represent the result on real hardware. If used effectively profiling and memory analysis will allow you to significantly improve the performance of your applications.

08
Apr
12

OpenGL on Android tips

I’ve been working on an 2d rendering engine for OpenGL on the Android platform using the standard Java API. In my ramblings around the internet there seems to be a misunderstanding about how some portions of the API were intended to be used. In particular the use of vertex, texture and colour arrays and the draw arrays function seems to be commonly misused from what I have read on forum posts around the net. Here I will list a few short tips about making your OpenGL apps on the Android platform run a bit smoother.

The first mistake that seems to happen a lot is the constant creation and destruction of objects in the rendering thread. Android devices and other mobile or embedded systems are very limited in their nature, they do not have much RAM and the processor is not as powerful as your standard PC or mac. Because of this it is important to keep the number of objects pretty constant within your games, otherwise the garbage collector will slow things down constantly recycling objects that you create and destroy on a regular basis.

The second mistake I’ve seen has to do with how the buffers are mis-used. It seems many people are regularly changing the contents of the buffers to move or change the location, size or even rotation on-screen. My mind boggles as to why you would do this over using the standard GL model view matrix operations. It’s my belief that you should create the buffers that hold the actual vertex information once, then in your renderer use glTranslate, glRotate and glScale to achieve any transforms required. The reasons for this are two-fold, firstly if you modify your buffers all the time you will need to modify all the vertices to achieve simple things like translate. Some people have reported that functions on the buffer objects can be slow on some devices, so it is best to use them sparingly. Secondly using the GL functions and matrix operations should be optimised in native code for each device, and so they should be faster. This should also apply to using the OpenGL library on other platforms with Java, mostly because of its interpreted nature. This would be different in some cases if you were writing code for a native language, as I’m told state changes to the GL matrices can be slower than changing the vertices, but this is only true for a small number of vertices where the computation time to change them is trivial.

The final tip I have for you is to make use of multiple threads for your app. You have a very limited amount of processing power so taking advantage of the fact many of the more modern devices are dual core is imperative. The simplest way to do this is to create a separate thread for the main logic of your app, leaving the main rendering thread to perform the main task you want it to, updating the display. Writing multithreaded applications is more complicated in some instances, but java provides some nice mechanisms that you should take advantage of. Declaring objects or just some methods on an object synchronized is a good way to achieve this. Java only allows one thread to execute synchronized methods on objects at a time, but the lock is done at the object level and not the method level. This means you can effectively make an interface object between both threads that is what we call thread safe. This means you are less likely to run into race condition bugs, and hopefully you’ll also avoid deadlock. Please note you should try to keep synchronized methods short to minimise the amount of time another thread is held up. This will help improve the responsiveness of your app. For more information on Java synchronization click here.

I hope this information helps some of you out there trying to build your own apps for Android. Many of these tips are also pretty good advice for building Applications using Java and OpenGL on any platform. If you still run into performance problems you should run a debugging tool called a profiler to find out which code is causing the most slow down. It’s quite important to remember the limited nature of resources on mobile devices, and that disposing of objects can come at quite a price in processing power when the garbage collector kicks in.




Enter your email address to follow this blog and receive notifications of new posts by email.


Mister G Kids

A daily comic about real stuff little kids say in school. By Matt Gajdoš

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm's Vintage Computing, Tech & Scale RC Blog

Random mutterings on retro computing, old technology, some new, plus radio controlled scale modelling.

ancientelectronics

retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way