Posts Tagged ‘garbage collector


Memory Allocation – Heap and Stack

A question I recently saw online was about how memory is allocated and used within the Java programming language. There was a little confusion about what the heap and stack are. Both the stack and heap are common to many languages such as pascal, C, and many more.

Stack space is a special construct in hardware used for storing temporary variables, return values and return addresses for functions. Most languages use the stack to store any local variables that will fall out of scope when a procedure (or method in OO language) finishes or returns. Java has the same style of stack construct for the virtual machine used for the same purpose. The Java language will however only store primitives (char, int, byte etc) and references to objects on the stack.

Heap space is a software construct and will be different depending on the language. It’s used for dynamically allocating memory usually for longer term storage of larger chunks of data. In languages like pascal and C, data on the heap is usually handled and referred to by a primitive called pointers. A pointer basically holds the memory location of whatever it is pointing to, they can be difficult to program.

Java doesn’t have a pointer primitive so it uses a different mechanism. Instead of having pointers, Java has references which are subtly different. A reference is different in that the memory location information is not available to the programmer where as it is with a pointer. This means you can’t do pointer arithmetic or try to access memory directly. The intent of this is to protect programmers from themselves and to make Java more secure. Java references are used for storing objects and arrays on the heap. The garbage collector keeps track of how many references are pointing to each item in the heap, and disposes of items that are no longer in use.

Older languages that target DOS and the older systems will allocate global variables to what is called the data segment. DOS and earlier processors used segmented memory addressing and in part comes from the 8086 machines assembly language. There are three types of segment, code, stack, and data. Code segments are for storing the machine code for the program and are pretty straight forward, there can be many of them but on DOS and 8086 machines the maximum size of each is 64Kb. The stack segment is singular and unique and is for the processor stack, there can be only one again with a maximum size of 64Kb. The data segment is as I said for statically defined data, and each segment is 64K in size. Borland Turbo Pascal only allowed one data segment and I suspect most other languages did the same. The heap would be outside the segments and would use the remaining memory, it was typically managed by a system library that was a part of the language used.

Java is fairly similar to native languages but does not use a segmented model, as many modern languages don’t. This is partly because hardware and software has changed in such a way as they are no longer needed.


Java Data Structure Performance

Recently I was browsing Stack Overflow looking at questions asked about Java. One theme that kept on recurring is that of the performance of various data structures within the Java API, in particular the Vector and ArrayList classes were frequently compared. So I decided that i would have a go at working out which data structures perform well.

The first thing I needed to decide upon was what kind of test was fair, and what kind of circumstances would I run the tests. I made a small test framework in Java so I could apply exactly the same algorithm to each data structure. I built the program to perform tests one after another with increasing data sizes to see how the data size affected the results. I wasn’t worried about factors such as the garbage collector and other tasks running on the OS as these would affect all the tests and I could take some precautions to minimise their effect. I’ll make my code available in the code section when I’ve finished tidying it up a bit.

I made code to test several different major aspects of a data structure. I recorded how long (in milliseconds) it took to create, insert into, remove an item by index, and remove an item by its value. I was just using the System.currentTimeMillis() to work out how much time was used, so small-scale accuracy can’t be assured. If you want to be more accurate you could use a profiler on the same code. I chose to not require the data be ordered within the collections as the tests are not dependant upon ordering within the data structure.

I tested 4 different data structures and had a dummy test with no data store to check the algorithm for the tester wasn’t causing extra overhead. I tested an implementation of a standard primitive array and LinkedList as well as the Vector and ArrayList classes. I’ve graphed some of the results I had at the bottom of this post. The first result that I found very interesting is with 1000 or fewer records there are no significant differences between the various data structure types. So if you aren’t storing large amounts of data there is little to gain by picking one data type over another.

The LinkedList data type interestingly had some of the worst performance values getting worse comparatively as the data size increased. Particularly for the remove by index and remove by value operations. In theory it should have similar performance to the others so this is a bit confusing. Perhaps it is all the extra reference handling that makes it significantly slower. It’s no surprise that it isn’t faster than the other data types as the operations that work well are adding and removing at either end of the list. This means that the LinkedList data type is best suited to implement structures such as a stack or queue as that allows for dynamically sized structures that are relatively fast.

Vector and ArrayList showed some slight difference in performance but are for the most part pretty similar. This is in part because they use exactly the same underlying data storage mechanism. The only difference being that Vector is synchronized and therefore thread safe. The results from testing seem to indicate that the cost for acquiring the synchronization lock is a smaller factor the larger the data set you have. Even on smaller data sizes the difference wasn’t incredibly large. On a faster machine with a different JVM the difference between the two was even smaller, looking almost identical. So differences between these classes will largely depend on what kind of system you are running on, and will in most cases not be a huge difference. Again if the data size is small it doesn’t really matter which one you use.

Interestingly the Data type that did the best was the standard primitive array implementation. It was significantly faster than all the other data structures! The simple answer is that primitive arrays in java require fewer operations to manage when converted into machine code by the JVM compared to other methods of storage. In this case it is also faster for removal by index because the array did not worry about keeping items in order when removing an item. I simply moved the item at the end to occupy the space to be deleted and then decremented the size. This is a common practise for lists of items in games where the order in the array is not important.

Classes such as Vector, ArrayList and LinkedList preserve the order of items within them whether you require it or not. This carries an intrinsic cost when removing items, and at worst can be O(n) to remove an item. When order of the items isn’t a concern any item can be removed in constant time as I have done in this implementation of the primitive array.

So which data storage technique should you favour? There isn’t a simple answer to this question as it will depend on each particular problem. However it does seem that you will get the best results most often when you create a data structure tailored to your problem. In this particular instance a simple array was best. You can construct pretty much any type of data structure with a primitive array, so many different solutions can be implemented with better performance than the classes in the API. In my experience the Vector class is fine for most things and easy to use, but when I’ve required speed I often use a primitive array, or another type of data structure altogether such as a HashMap which is useful for creating a cache.

So why do the data store classes in the API exist in the first place? Simply put convenience and generality, when performance or data size are not going to cause problems it’s easier to use a pre-constructed class to store data in. The java storage classes are also implemented in a generic way so that they can be used for any purpose, this has a performance cost but you benefit by having data structures you can use for many different data types and situations. If you want some of that convenience and better performance, you could use the Trove library for data storage. Trove is faster than the java API but does not really have generality like the classes in the API. Check it out here.


The Java Programming language

The first language I learned at university was Java. Java is a high level object-oriented language, it is used in web pages in the form of applets. Some features of Java were very unique at the time it was created. The compiler does not generate native code, but rather generates something called byte code. Byte code basically a special form of assembly language for a machine that does not exist, in essence a virtual machine. Of course this special code will not function on normal hardware, so special interpreter software is used to run it. Interpreters for many platforms exist, for windows, mac OS, and all the unix platforms. This in and of itself isn’t that remarkable as many BASIC interpreters also generate a binary format to represent instructions in a similar way. Java code is typically very portable and is in use across many platforms today. Many programs can run across multiple platforms without modification.

Java was the major language I learned for using object-oriented and I found that it was pretty easy to get the hang of. All the major important aspects of object-oriented languages are represented. Inheritance and abstraction are implemented through extending classes and implementing interfaces. Something that helped me learn the language easier was the simpler form memory management. You can hold something called a reference to an object, which is simply Java speak for having a pointer to an object. A reference is created by the constructor implicitly and can be used freely. When the reference is no longer needed you can let it go, and the Java garbage collector will free the object at an appropriate time. You can’t perform pointer arithmetic on references (or change them manually) which means that memory references can’t be tampered with, which protects Java programs from themselves.

The API is very well documented which has been very handy to me over the years. You can generate your own documentation using the javadoc tool which I would definitely recommend you do. The API is full of very useful classes, in particular the util, nio and swing packages will contain classes you will use on a regular basis. I’ve found many of the data storage classes in util are good. Especially the vector and hashmap classes.

Over the course of my university degree I have written many Java programs for assignments. The language made implementing and studying computer science algorithms, data structures and software designs much simpler. It also helped my coding style and documentation improve and become more readable. I still use Java in much of my programming today as I can write simpler code without worrying about some of the specifics of the platform I’m writing for. I still like to write code in native programming languages partly for the challenge and partly for speed.


Profiling an Android Application

As you may or may not know, I have been writing an app for the Android platform using OpenGL. Anyone writing for OpenGL on the Android platform should know that it is important to make sure your code is memory efficient, and most importantly fast. In order to achieve this it is important that you test and analyse your code on a regular basis. There are some useful tools that come with the Android SDK and some that can be downloaded with eclipse that you should use.

For profiling you application you should use Traceview. In order to make effective use of this tool you need to make a trace file covering the time frame that you are interested in. You can do this with eclipse, but the length of the trace is limited, and it can be inaccurate. I suggest that you make use of the android.os.Debug class to make the trace file, because you can control the size of the trace buffer.  I have used Traceview to analyse the lengths of time it takes to execute elements of my rendering code, and to monitor the changes in timings when I have changed my code.

For analysing how much memory you are using there are several things you should do. Firstly take a look at the output of logcat and see how often the garbage collector runs and how much memory it frees each time it does. If your app creates and releases objects on a regular basis, the garbage collector will spend a lot of time cleaning up memory which will slow your app down. The garbage collector will also tell you how much heap memory you are using. You need to be aware that there is limited heap space available to apps on Android phones. Some have a limit of 24Mb of heap, but it differs from device to device. So you need to be careful of the amount of RAM used by your app, even if the target device has a lot of RAM. A good way to find out how your app uses memory is to use the eclipse memory analysis tool. It is fairly simple to get working in eclipse, and like method tracing can also be activated by the Debug class. I found that I had to limit the results of all my searches to my project classes to get useful results. There are a number of other classes that belong to the Android API that take up a significant memory space in a small app. Either way the analysis tool is very good, but can be a little complex to use if you don’t know the terminology.

The tools have been useful to me in finding some areas of my code which could be improved, and quantified the improvement. I found a section where I had created a String object implicitly in a method call. When the method finished the String was no longer needed and the garbage collector would kick in once every two seconds. I changed my code to use a different data type and not create objects implicitly and got the relevant code to run twice as fast in addition to not requiring the garbage collector as often. I have also been able to determine the amount of memory my objects require and tune them appropriately.

The analysis tools are only half the solution. Your application needs to have a good solid design, to make it easier to diagnose and correct problems. Lastly it is important to know how to interpret the information from the tools. Many parts of the analysis tools cause slow down or extra memory use, so you need to be aware that the statistics presented are relative to each other and don’t always represent the result on real hardware. If used effectively profiling and memory analysis will allow you to significantly improve the performance of your applications.

Enter your email address to follow this blog and receive notifications of new posts by email.

Mister G Kids

A daily comic about real stuff little kids say in school. By Matt Gajdoš

Random Battles: my life long level grind

completing every RPG, ever.

Gough's Tech Zone

Reversing the mindless enslavement of humans by technology.

Retrocosm's Vintage Computing, Tech & Scale RC Blog

Random mutterings on retro computing, old technology, some new, plus radio controlled scale modelling.


retro computing and gaming plus a little more

Retrocomputing with 90's SPARC

21st-Century computing, the hard way