I find myself unprepared this week for what I was going to write about, partly because of playing Dwarf Fortress after having had a long respite, and having been away for a good portion of the weekend. So I don’t have any screen captures done for the game I was planning on writing about.
Instead, I’ll be talking today about multi-threading on both Java and Python and why in general Java is better at it.
The story begins with a project I had at work, where I was asked to speed up a program written in python. The code was written in a single thread and I was told that it was a sequential type of algorithm that the coder thought couldn’t be multi-threaded. This sounded like a challenge to me so I got a copy of the code to look at and began optimising.
The code was designed to solve a problem within an area of mathematics, something related to topological groups. I didn’t really understand the complete idea behind what he was trying to do, after all the person that wrote the code was an honours student and being a engineer I didn’t do anything harder than second year maths, and not in this field. The basic gist is that he was trying to work out the rate of usual growth and geodesic growth of a particular word problem and if there is a pattern to the growth rates. If you understand this you probably know more than I do!
So I had some code and began investigating. I couldn’t see any obvious way to make a huge improvement to his algorithm, which isn’t surprising, but I did find some areas of implementation that I could fix to make the program faster.
The code used string variables, it used them _a lot_. After profiling the code I found that it spent a good portion of its time doing the string length function (len). Changing the code to use this function less (by keeping track of the length in a variable) I managed to speed up the code by quite a bit. This is because measuring the length of a python string is an order N operation, as the length isn’t stored but computed each time by traversing the string. This is pretty much the same as is done in many languages such as C.
I found a few other implementation details relating to strings, but quickly ran into a wall of not being able to make the program much faster. So my attention turned to changing the code to take advantage of multiple cores. Now having become more acquainted with the code I noted that there were situations where one function was called many times on many strings in a data set. This data set unfortunately grows rather quickly (exponential I think) so later iterations of the algorithm could take a long time single threaded. So I decided to go about processing more than one string at the same time in many threads.
This unfortunately led to me find a problem with using python for any kind of number crunching. After doing some research I found out about something called the Global Interpreter Lock. This lock basically means that within one instance of Python the interpreter will only be active on one thread. Because the code runs almost exclusively in the interpreter it wasn’t really going to be possible to do all the work within one instance of python. The threading model in python works well for stuff like I/O where a thread may not need the interpreter for a while and it can be used by someone else.
So being a Unix user I thought of using multiple instances of python and using OS pipes to communicate between all the different instances. This is called multi-processing rather than multi-threading, but it still allowed me to make use of many more processors and in theory it could have even been extended to multiple machines, but as it turned out this wasn’t necessary and wouldn’t have helped.
The problem I faced after much testing was that the main python process was spending way more time on I/O than pretty much anything else. Part of the problem being the large volume of data that needed to traverse the pipes in order for each thread to run. I tried several different techniques but couldn’t get an Improvement, this is when I decided a port to Java would be a good idea.
The port to Java was pretty straight forward for the code that makes up the main part of the algorithm, so I took the opportunity to do a simple test as to which one has better single threaded performance. I ran the python version of the code using pipes to both java and python processing instances in turn. I found with equivalent code that Java was about 50% faster just for the function I had put into the separate process, this looked promising and was useful in debugging some of the java code. Given that python is interpreted this shouldn’t be surprising.
I moved to java because of the threading model being much better. Firstly it actually supports having more than one VM thread running at a time, so it makes it possible to actually do some processing on multiple threads, and secondly it has a bunch of handy mechanisms to help allow you to get the maximum processing done. Synchronized blocks, locks, and the notify/wait system allows for thread safe programming whilst not having to poll for events, thus saving processing time. The main advantage over python in this case is not having to do the I/O to get data to processing threads, and being able to multi-thread more sections of the code.
Now having completed the java port the performance stats speak for themselves, it is significantly faster than the python implementation, taking about 4 days to do what would take weeks even with the multi-process python implementation. I do have to note that I was using the 2.7 version of python, so this does not reflect the performance or programming model in the new python 3 implementation.
Recent Comments