Posts Tagged ‘profiler


Java Data Structure Performance

Recently I was browsing Stack Overflow looking at questions asked about Java. One theme that kept on recurring is that of the performance of various data structures within the Java API, in particular the Vector and ArrayList classes were frequently compared. So I decided that i would have a go at working out which data structures perform well.

The first thing I needed to decide upon was what kind of test was fair, and what kind of circumstances would I run the tests. I made a small test framework in Java so I could apply exactly the same algorithm to each data structure. I built the program to perform tests one after another with increasing data sizes to see how the data size affected the results. I wasn’t worried about factors such as the garbage collector and other tasks running on the OS as these would affect all the tests and I could take some precautions to minimise their effect. I’ll make my code available in the code section when I’ve finished tidying it up a bit.

I made code to test several different major aspects of a data structure. I recorded how long (in milliseconds) it took to create, insert into, remove an item by index, and remove an item by its value. I was just using the System.currentTimeMillis() to work out how much time was used, so small-scale accuracy can’t be assured. If you want to be more accurate you could use a profiler on the same code. I chose to not require the data be ordered within the collections as the tests are not dependant upon ordering within the data structure.

I tested 4 different data structures and had a dummy test with no data store to check the algorithm for the tester wasn’t causing extra overhead. I tested an implementation of a standard primitive array and LinkedList as well as the Vector and ArrayList classes. I’ve graphed some of the results I had at the bottom of this post. The first result that I found very interesting is with 1000 or fewer records there are no significant differences between the various data structure types. So if you aren’t storing large amounts of data there is little to gain by picking one data type over another.

The LinkedList data type interestingly had some of the worst performance values getting worse comparatively as the data size increased. Particularly for the remove by index and remove by value operations. In theory it should have similar performance to the others so this is a bit confusing. Perhaps it is all the extra reference handling that makes it significantly slower. It’s no surprise that it isn’t faster than the other data types as the operations that work well are adding and removing at either end of the list. This means that the LinkedList data type is best suited to implement structures such as a stack or queue as that allows for dynamically sized structures that are relatively fast.

Vector and ArrayList showed some slight difference in performance but are for the most part pretty similar. This is in part because they use exactly the same underlying data storage mechanism. The only difference being that Vector is synchronized and therefore thread safe. The results from testing seem to indicate that the cost for acquiring the synchronization lock is a smaller factor the larger the data set you have. Even on smaller data sizes the difference wasn’t incredibly large. On a faster machine with a different JVM the difference between the two was even smaller, looking almost identical. So differences between these classes will largely depend on what kind of system you are running on, and will in most cases not be a huge difference. Again if the data size is small it doesn’t really matter which one you use.

Interestingly the Data type that did the best was the standard primitive array implementation. It was significantly faster than all the other data structures! The simple answer is that primitive arrays in java require fewer operations to manage when converted into machine code by the JVM compared to other methods of storage. In this case it is also faster for removal by index because the array did not worry about keeping items in order when removing an item. I simply moved the item at the end to occupy the space to be deleted and then decremented the size. This is a common practise for lists of items in games where the order in the array is not important.

Classes such as Vector, ArrayList and LinkedList preserve the order of items within them whether you require it or not. This carries an intrinsic cost when removing items, and at worst can be O(n) to remove an item. When order of the items isn’t a concern any item can be removed in constant time as I have done in this implementation of the primitive array.

So which data storage technique should you favour? There isn’t a simple answer to this question as it will depend on each particular problem. However it does seem that you will get the best results most often when you create a data structure tailored to your problem. In this particular instance a simple array was best. You can construct pretty much any type of data structure with a primitive array, so many different solutions can be implemented with better performance than the classes in the API. In my experience the Vector class is fine for most things and easy to use, but when I’ve required speed I often use a primitive array, or another type of data structure altogether such as a HashMap which is useful for creating a cache.

So why do the data store classes in the API exist in the first place? Simply put convenience and generality, when performance or data size are not going to cause problems it’s easier to use a pre-constructed class to store data in. The java storage classes are also implemented in a generic way so that they can be used for any purpose, this has a performance cost but you benefit by having data structures you can use for many different data types and situations. If you want some of that convenience and better performance, you could use the Trove library for data storage. Trove is faster than the java API but does not really have generality like the classes in the API. Check it out here.


Profiling an Android Application

As you may or may not know, I have been writing an app for the Android platform using OpenGL. Anyone writing for OpenGL on the Android platform should know that it is important to make sure your code is memory efficient, and most importantly fast. In order to achieve this it is important that you test and analyse your code on a regular basis. There are some useful tools that come with the Android SDK and some that can be downloaded with eclipse that you should use.

For profiling you application you should use Traceview. In order to make effective use of this tool you need to make a trace file covering the time frame that you are interested in. You can do this with eclipse, but the length of the trace is limited, and it can be inaccurate. I suggest that you make use of the android.os.Debug class to make the trace file, because you can control the size of the trace buffer.  I have used Traceview to analyse the lengths of time it takes to execute elements of my rendering code, and to monitor the changes in timings when I have changed my code.

For analysing how much memory you are using there are several things you should do. Firstly take a look at the output of logcat and see how often the garbage collector runs and how much memory it frees each time it does. If your app creates and releases objects on a regular basis, the garbage collector will spend a lot of time cleaning up memory which will slow your app down. The garbage collector will also tell you how much heap memory you are using. You need to be aware that there is limited heap space available to apps on Android phones. Some have a limit of 24Mb of heap, but it differs from device to device. So you need to be careful of the amount of RAM used by your app, even if the target device has a lot of RAM. A good way to find out how your app uses memory is to use the eclipse memory analysis tool. It is fairly simple to get working in eclipse, and like method tracing can also be activated by the Debug class. I found that I had to limit the results of all my searches to my project classes to get useful results. There are a number of other classes that belong to the Android API that take up a significant memory space in a small app. Either way the analysis tool is very good, but can be a little complex to use if you don’t know the terminology.

The tools have been useful to me in finding some areas of my code which could be improved, and quantified the improvement. I found a section where I had created a String object implicitly in a method call. When the method finished the String was no longer needed and the garbage collector would kick in once every two seconds. I changed my code to use a different data type and not create objects implicitly and got the relevant code to run twice as fast in addition to not requiring the garbage collector as often. I have also been able to determine the amount of memory my objects require and tune them appropriately.

The analysis tools are only half the solution. Your application needs to have a good solid design, to make it easier to diagnose and correct problems. Lastly it is important to know how to interpret the information from the tools. Many parts of the analysis tools cause slow down or extra memory use, so you need to be aware that the statistics presented are relative to each other and don’t always represent the result on real hardware. If used effectively profiling and memory analysis will allow you to significantly improve the performance of your applications.

Enter your email address to follow this blog and receive notifications of new posts by email.

Mister G Kids

A daily comic about real stuff little kids say in school. By Matt Gajdoš

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm's Vintage Computing, Tech & Scale RC Blog

Random mutterings on retro computing, old technology, some new, plus radio controlled scale modelling.


retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way