The other day I saw that Sun released JDK 1.6 Update 14 that includes version 14 of their HotSpot Virtual Machine. I think for the first time they included in the release notes a section mentioning the VM now accepts a DoEscapeAnalysis flag. Escape analysis has been a much talked about optimization technique, but while it's been around for a while this was the first time I've seen Sun officially document it.
My coworker then pointed me to an article that questioned whether escape analysis actually did anything (bing "did escape analysis escape Java"). The article is quite informative, but I was skeptical of the benchmark test and its conclusions. I mean, there are really smart people at Sun (not just guys like me with a blog who tries to sound smart) and I found it hard to believe that -XX:+DoEscapeAnalysis did nothing. In fairness, there's a second part to the blog, also quite informative, that digs a little deeper into benchmarks and clarifies some things.
Uneducated Guesses
Playing around a bit (with Java scientific benchmark) I think -XX:+DoEscapeAnalysis does offer a performance gain, but it's not as much as some of us had hoped (some of us are always hoping the next "breakthrough" will solve all our problems). Also, it seems that escape analysis is not as predictable as the other concurrency optimization techniques. One runs saw little gains, the next saw more substantial gains; whereas biased locking and lock coarsening predictably, consistently saw big performance improvements. Escape analysis doesn't seem to kick it sometimes. This is all guessing on my part. I have no idea on a low level how these techniques are implemented.
At this point some might be asking, what is escape analysis, biased locking and lock coarsening, anyway? I wrote a little bit about biased locking here last year. But now that I have OmniGraffle Pro I'll explain things better with diagrams.
Lock Coarsening
Let's start with lock coarsening. You enable lock coarsening with -XX:+EliminateLocks. This does not mean locks are completely removed/eliminated. Rather, instead of a thread repeatedly acquiring and releasing the same lock the VM will combine these calls.
In figure 1 you can see two critical sections guarded by the same lock. Lock coarsening will merge the two critical sections, along with the non-critical section, into a larger synchronized block. Instead of two acquire/release calls there's only one. Moreover, with a larger block of code to work with other optimizations can be better applied. While this reduces the locking overhead it should be noted larger critical sections may cause the application to be less responsive. Some may notice the irony that we often try to break up large synchronized blocks.
Biased Locking
Up next is biased locking. Biased locking stems from real world observations that most locks are uncontended and at most one thread tries to acquire the lock. Here's how two threads locking two different objects work without biased locking.
As each thread enters the critical section it tries to acquire the (different, uncontended) lock. This involves expensive operations: operating system mutexes, conditional variables, compare and swaps, etc. Given real word observations we can optimize by biasing objects to threads base on some heuristics (e.g., which thread created the object). The threads can then lock/unlock without expensive operations.
The downside is if there's another thread that comes along and attempts to acquire the same lock the lock bias has to be revoked then. If I remember correctly the VM does some sort of bulk rebiasing to amortize the overhead.
Escape Analysis
Finally, escape analysis. First, the virtual machine analyzes execution to see which objects escapes into the unknown and which are confined to the thread/stack frame. Objects that are confined can be optimized. The VM does two kinds of optimization on these confined objects. First, since the objects are confined to the thread it is safe to remove locking (no other thread can possibly come along and use the the confined objects). The fancy term is lock elision. The most basic example is with a local StringBuffer, but there are actually some really interesting examples that aren't so obvious, i.e., examples that can't be optimized statically at compile time.
The second optimization that can be done after escape analysis is allocation optimization. In Java when you do a new FooBar() the object is allocated on the heap. Heap allocation is actually really faster, contrary to what many might say, but then you have to deal with garbage collection (the de-allocation part). There's also an interesting issue with latency from cache misses due to the fact that memory is hierarchical. (google Brian Goetz, he explains it better).
However, since the VM knows that these objects won't escape it can allocate them on the stack or put them in the registers, known as stack allocation and scalar replacement, respectively. For example, if FooBar was a simple class that contained just a single int it could avoid allocation and the int field could be place in the register. This avoids allocation, garbage collection and cache misses.
Just as with lock coarsening, escape analysis leads to further optimization opportunities.
But as I've said before, my dream is to one day never have to deal with locks. I'm still waiting transactional memory, which might be coming soon hardware transactional memory.
This is interesting stuff. I wonder how long these features have been hiding out on the backchannel.
SUN has been talking about allowing stack allocation for sometime now Im glad to see they are internalizing that in optimizations.
What I was mostly worried about is that they would make it a developer decision to annotate fields or whatever as stack allocated.
Seems simple and straightforward, but I have met very few people in my career who understand the difference bet stack and heap allocation and unleashing that choice on developers would turn Java into a mini C++ which is the last thing anyone wants.
Anyway nice rundown.
Posted by: bob | June 08, 2009 at 01:02 AM
bing? Oh dear oh dear...
Posted by: Nobbin | June 08, 2009 at 05:21 AM
You can get software transactional memory today. Clojure comes with STM built-in. Pure Java people can use Multiverse, although I'm not sure how production ready it is.
Posted by: Christian Vest Hansen | June 08, 2009 at 08:00 AM
Hopefully someone reads these comments....
Can you use both Biased Locking and Escape Analysis? From your descriptions, it doesn't seem like they are mutually exclusive, but I wouldn't want to completely befuddle my JVM by having to optimizations working against each other.
Thanks for a great article.
Brian
Posted by: www.facebook.com/profile.php?id=667417047 | September 29, 2009 at 07:48 AM
Nice article and explanation of the topics. You saved me some work in not having to explain it all again :-)
http://www.javaspecialists.eu/archive/Issue179.html
Heinz
Posted by: Heinz Kabutz | December 30, 2009 at 03:52 PM
SUN has been talking about allowing stack allocation for sometime now Im glad to see they are internalizing that in optimizations.
Posted by: generic viagra | April 13, 2010 at 07:34 AM
I helped write the article on "Does Escape Analysis Really Work" and I will tell you that we spent quite a bit of time and involved JVM/HotSpot engineers in the effort to validate results. At the time of writing Escape Analysis did not offer much improvement. But with all questions of performance, time moves on and such is the case with Escape Analysis. It now works much better.
Regards,
Kirk Pepperdine
Posted by: Kcpeppe | January 16, 2011 at 10:45 AM