« OSGi's service model, more than just versioning | Main | Password masking, Hollywod and standards »

February 20, 2009

The multi core processor quandry for software developmet - can and will you take advantage of it

The era of multi-core processors, dual-core two processors in one, quad-core four processors in one, is now in full-swing. If you think to yourself 'my software will be blazing on a machine running a hexa-core processor which is six processors in one!'. It just might, but before it can do that, your software might as well run on a single-core processor if it isn't designed to take advantage of this fact. This post is a high-level view on multi-core processors, concurrency and the programming languages that have caught recent headlines as concurrency champions and how they fit this multi-core paradigm.

[Entry continues to the left and below ad ]

Multi-core processors are a great achievement, not just technologically since a single processor can offer the processing power of two, four or 'n' processors, but also because what was once considered low-end hardware will gradually be equipped with this same CPU capacity.

Prior to the appearance of multi-core processors, the only way to harness the power of multiple CPUs was through multi-processor machines, which to this day are limited to high-end servers and workstations that are charged with things like video editing and scientific research(e.g.DNA analysis), areas that require huge amounts of CPU cycles to crunch data and obtain results in a reasonable amount of time.

But suddenly and with multi-core processors, that new laptop you bought will have the same computing capacity as these high-end workstations or servers that require special boards to accommodate multiple processors. All this is great of course, more 'bang for the buck' they say, but can an application take advantage of these two, four or even six cores? Yes, if its designed to do so.

Concurrency, threading and multi-threading

On a machine with a single processor(single-core) every program/application vies for the same CPU. So to speak, each program/application needs to 'stand in line' and wait its turn to access the CPU. This takes place so fast though, that in the great majority of cases its often undetectable, there are cases however where the performance implications are notable, in which case you have a concurrency problem.

These concurrency problems tend to present themselves when a certain section of an application needs to be executed over and over again (e.g. processing millions of strains in a DNA sample or thousands of frames on a video clip). However, having multiple processors or cores isn't sufficient for making a single application use more than one processor.

Having multiple-cores may in fact better the overall performance of a machine's applications -- since they will now have 2,4 or 6 places to 'stand-in-line' for CPU processing -- but if a single application is not designed to exploit the presence of multiple processors or cores, that single application will run just the same. So what do you need to do? Parallelize the section of an application that is CPU intensive, so it can be run concurrently on whatever processors or cores are available.

The typical way to achieve parallelism is by relying on 'threads'. A 'thread' effectively partitions an application into multiple execution paths, where each one can independently vie for CPU cycles. Hence, if a CPU intensive section of an application is threaded, each 'thread' can be capable of using a different processor or core simultaneously, resulting in better performance for that single application.

But don't 'threads' have more to do with graphical applications? Not necessarily, 'threads' are simply about splitting an application's execution paths. In graphical applications -- which are often the classical threading example -- an application's execution paths are split so each one is performed asynchronously (one task doesn't have to wait for the other to finish). This allows an application's mouse movements, interface and business logic to be decoupled, were it not for threading, a main interface would blink or the mouse pointer would sputter due to one action awaiting completion of another.

In such cases, its not that each 'thread' is vying for CPU cycles, but rather isolating itself from depending on other actions. However, other scenarios often involve a constant need for CPU cycles, where 'threads' are used not so much because certain actions cannot wait for another, but rather because performing actions in parallel can have substantial performance gains, where each 'thread' is executed on whatever processor or core is available.

So rest-assured that in addition to graphical applications, all other applications targeting multiple processor machines -- like video editing, scientific research(e.g.DNA analysis) and web servers -- are multi-threaded to the fullest extent possible in order to unload 'threads' into whatever processors are available on the hardware and execute them in parallel.

But with the advent of multi-core processors, the possibility of having a series of 'threads' execute in parallel belonging to all but the simplest application arises, something that wasn't possible earlier given the limited availability of multiple processor machines.

All this of course leads to the very deep topic of multi-threaded programming, which is considered hard due to the design considerations it entails. You may be able to split an application's logic into multiple 'threads' just fine, but an entire application still relies on the same data and in-memory values, which can lead to all sort of complications with multiple 'threads' of execution operating on this same data and values, things like synchronization, deadlocks, resources starvation and all the other fun stuff that goes with multi-threaded programming.

So are 'threads' the only way to truly use multi-core processors? The short and conceptual answer is yes, but given the sudden availability of multi-core processors on a wide array of hardware, alternatives to parallelizing applications have surged to deal with the concurrency problems of spreading workloads into multiple CPUs (cores or processors).

The surging alternatives: Erlang, Scala, Clojure and Co.

Among this sudden surge are a series of programming languages, a thing I can only attribute to the difficulty of multi-threaded programming in other languages. So what is different about these programming languages?

I'll choose Erlang as an example, since it seems to be the one getting the most attention and also the most different from other languages. For starters, Erlang doesn't have 'threads' as do other languages. You might say, 'Wait a minute, if it doesn't even support 'threads' how come its making the news as a concurrent language and the best option for multi-core programming?' The reason it doesn't support 'threads' is because it uses 'processes' that communicate with each other using message passing instead.

The biggest problem faced with 'threads' is that since they all form part of the same application(e.g. same data and memory structures) they often need to communicate with each other to inform what it's each one is doing, which leads to the problems cited earlier (synchronization, deadlocks,etc). Issues that are non-factors using message passing and 'processes'.

I won't go into more specifics ( which you can find here ) but message passing alleviates many of the pitfalls that often qualify multi-threaded programming as hard. In Erlang, an application can be partitioned into different 'processes' that are able to exploit multiple cores in parallel, and though its syntax and style are vastly different than popular object orientated languages, it achieves the purpose of making concurrent programming easier or at minimum lifting the error mine-field faced by many a multi-threaded developer.

In response to this popularity of making applications capable of leveraging multiple cores (processors) simultaneously, other variations of this "message passing" style based on the Actor Model have emerged.

One of them is Scala , a Java Virtual Machine(JVM) compatible language with its own Scala Actors project that allows this "message passing" style to be used for applications running on Java's runtime.

Another option to emerge has been Clojure , another JVM compatible language based on LISP, which in due part to this last language's functional programming roots, avoids using state and mutable data which are the very principles of "message passing".

Inclusively, the Java language itself has taken queues from the difficulty of using 'threads' to use parallelism and introduced projects like the fork-join framework that ease the development of applications to take of advantage of multiple cores.

Still, for the benefits these initiatives bring to developing applications targeting multi-core processors, you need to ask yourself, Is it even worth it to use 'threads' or one of these other 'message passing' approaches in your applications?

Can an application gain from being parallelized or even be parallelized?

This may come as a shock to many who've jumped on the concurrency bandwagon and started using Erlang or Scala because they are 'concurrent languages' and expect their applications to run 2, 4 or 6 times faster on a duo-quad-hexa core processor just because they are using a 'concurrent language', but if you don't think your application thorough with parallelization in mind, you might as well have used a single processor and Fortran.

Not every application can benefit from parallelism and in turn from multiple cores. Providing a more formal discourse on this statement is Amdahl's Law , which basically states that there is limit as to how much an application can gain from multiple processors or cores. As a consequence, using 'threads' or 'message passing' for the purpose of exploiting multiple cores or processors can become a moot point.

In addition, it should also be clear that only certain parts of an application can gain from parallelism, parts which need to be clearly identified in order to incorporate 'threading' or the 'message passing' approach between 'processes' that has enjoyed broader appeal in recent times.

In this sense, even experienced multi-threaded developers in languages like C++ have an upper hand at identifying these patterns and hence multi-core development, over any newbies making their way into Erlang or Scala. Granted -- as already mentioned -- these last 'message-passing' style languages make it easier for newcomers to experiment with multi-core development, without being exposed to the lower-error threshold that is characteristic of multi-threaded programming.(e.g it just takes a subtle assumption to introduce an error/bug in a multi-threaded application, 'message-passing' is much more forgiving to introducing similar errors/bugs since its style prevents it)

So besides one approach being less error prone than another for developing multi-core applications, is there any other difference to using 'threads' and 'message-passing' for tackling concurrency? There apparently is, albeit in a very contentious area: Performance.

Finally, a word on performance with 'threads' and 'message-passing'

Performance metrics are always seen with skepticism, especially since many can be set-up to favor a particular option or compare 'apples with oranges', but I will just present what I've seen in terms of concurrency and let you make up your own mind.

Web servers are an excellent topic to explore concurrency performance on, given they often need to handle hundreds or thousands of users with each one vying for CPU time. In such cases, the availability of more processors or cores leads to 'threads' being executed in parallel resulting in increased performance. ( See this basic intro on a Simple Multithreaded Web Server which illustrates the concept of a web server using 'threads').

In the web server market one of the most popular options is Apache , which as you would expect relies on multi-threading. More recently though, there's also been work done on an Erlang based web server named Yaws , which given its foundations relies on the 'message passing' paradigm to exploit multiple cores/processors.

According to this Apache vs. Yaws benchmark , the Erlang based Yaws web server is capable handling over 80,000 parallel connections, where as the more popular multi-threaded Apache web server is capable of handling a mere 4,000 parallel sessions. An important piece of this benchmark is that those performing this study attribute the difference to how the operating system handles threads and processes, further noting that Erlang does not make use of the underlying operating system threads and processes for managing its own process pool.

If verifiable, this is telling not only because it illustrates that concurrency performance is as much a part of the operating system, but also because it shows Erlang's superior 'process' centric approach to concurrency that is based on 'message passing'.

In some informal talks I've had with colleagues, we've partially agreed that besides an issue with the operating system itself, another factor may be due to multi-threading itself. Since multi-threaded programming can have so many pitfalls, experts in this area treat 'threads' very carefully in an application, often erring on the safe-side instead of the performance side, which in the end may hurt the overall performance and execution of threads. This of course is not a problem present in the 'message-passing'/'processes' principles of Erlang.

In the end, whether you choose to rely on 'threads' or 'message-passing' using some of the languages that support this last technique, its equally important that you identify which parts of an application can gain from using parallelism. For it will only be then, that a single application will be able to gain from the presence of multiple cores.

Update Stumbled upon this article over at Javaworld, good read. It covers more concepts and pretty much 'functional programming', including how this style makes it easier to develop multi-core applications: Building cloud-ready, multicore-friendly applications,

[Comments below ad ]

Posted by Daniel at February 20, 2009 9:25 AM


Comments


Post a comment




Remember Me?

(you may use HTML tags for style)

Track back Pings

Track Back URL for this entry:
http://www.webforefront.com/mtblog/mt-tb.cgi/109.