Just in the last few months I've been seeing an increase in the number of people seeking advice in the forums on how to tune garbage collection and/or size Java heap. While it is encouraging to see that more people are becoming more aware that JVM configuration is important to good performance, I got the sense that people are still struggling to sort out how all of this automated memory management stuff actually works.
They get the idea that things are kept in heap and that heap is subdivided. They get the idea that the garbage collector works to free up memory that is no longer being "used". They are out looking at lists that go on and on... and on.. .... and on, and on with every possible switch configuration setting and then are all of a sudden overwhelmed with the array of choice. Having no time to build a complete understanding, which is an understandable situation, they go off to a forum and ask a question which of course draws dozens of responses all of which are mostly different thus crystallizing the only thing we know about GC and heap sizing, your guess is a good as mine. And so, we guess.
When it comes to performance and tuning, it has been my experience that guessing is not a very good way of ensuring desired results in a consistent manner. This knowledge resulted in Jack and I (I forget who said it first so I'll say Jack) to coin the mantra, Measure, don't guess. The question is, how do we take the guessing out of heap sizing and collector selection? In other words, what do we need to measure.
It is my expressed opinion that the primary goal of all performance tuning exercises should be maximize the end user experience given the resource constraints we are required to work under. A less formal expression of this would be; minimize response times to the end user (though low latency maybe a more important target). So it falls from this that we need to be measuring user response times. If the user response times are within tolerance then we are done! There is no need to touch anything. However, if you are asking the question, is memory management a bottleneck in my system, then most likely it is because that user response times are not with-in tolerances and we need to start asking deeper questions by first looking at GC efficiency (also known as GC throughput).
GC efficiency is defined as the % of time it is running in exclusion to your application over the run time of the application. You may find it useful to limit this definition from the amount of time the JVM has been running to the amount of time your application has been active. No matter the definition, the best way to calculation this value is to collect the logs produced by -verbose:gc switch (-Xloggc:logfile.name preferred for Sun JVM) and feed it through a tool such as HPJTune (free download). If the value produced by the tool is greater than 10%, you have a case for proceeding with the tuning process. If it is less than 10% but greater than 5%, tuning may help but it might not give you the boost you were hoping for. Anything less then 5% and you're most likely wasting your time. Again, latency concerns aside which is why the hedge words "most likely" are included in the preceding statement.
Ok, you take the measurement and you see that value is way above 10%. Now what? The answer depends on which version of the JVM you are using as over time things have gotten better and have also opened up more options. But lets focus on the "things have gotten better" part.
Garbage collection ergonomics are to memory management as the "Just-in-time" or JIT compiler is to execution speed. Information is collected via dynamic profiling (think HotSpot) and that information is used to control a number of aspect of Java heap and the behavior of the collectors and in some cases, the choice of collector. The simple example is what happens a 1.6 version of the JVM starts up.
On startup, the 1.6 JVM does a survey of the environment in which it is executing and uses that information to determine if it should behave as a server or a client JVM. This choice affects, the number of GC helper threads, the sizes of the various spaces with-in the heap and a whole wrath of other configuration values. But configuration doesn't stop here. Instead dynamic profiling directed by GC ergonomics is used to further refine how much heap is allocated, how it is to be proportioned to if we need to change from the less efficient throughput focused collector to the more efficient implementations. Now here is a take away, the more switches you set, the more parameters you fix, the less options ergonomics has to dynamically adjust to the situation on the ground (or in the JVM as it may be). Short story, the less you fiddle with, the better things will be for you in the long run.
So with that in mind, you need to define the goals of the tuning exercise. Most likely they are going to be; improve GC throughput and decrease GC pause times. Most likely the answer to both of these problems is going to be, configure max memory using the -Xmx flag. If you do this carefully, you should see (and there will be notable exceptions to this) GC efficiency improving. Hopefully you will also see GC pause times falling along with the improvement. So, if a little is good, a lot must be really good.. no? And no. If you give the system too much memory, GC frequency will fall and GC efficiency will improve but you will start to experience long GC pause times as the system tries to maintain the much too large heap space. In other words, GC pause times will bottom out at some optimal heap size. If you move away from that point, GC pause times will start to increase.
The trouble with the above explanation is that it is too simple. For one it considers that the optimal heap size is a constant for your application. In fact it isn't. This is why we don't want to get hung up in finding the optimal size, there simply isn't one. It all depends on time local rates of object creation and object churn and other things. Fortunately GC ergonomics will adjust things to be best cope with the flux. That is, unless you pin it down to specific values using command line settings.
One of the advantages (of which there are many) of using generational spaces is that you can play tricks to reduce GC pause time. Here is how it works. Objects are created in young space or more specifically in Eden under most circumstances. When Eden is full a collection is triggered during which live objects will either be concentrated into one of the two survivor spaces or tenured into old space. Now there are two important points to be considered.
Point 1, paradoxically, the cost of collecting young is determined by the number of objects that survive, not the number that die. This is because to clear Eden all you have to do is copy the live object to a survivor space. Now all of the memory in Eden can be returned to the free list in one shot. The expensive operation, aside from finding the live objects, is to copy them. Note, we are not finding dead object, we are finding live ones. Ditto for the other survivor space which will be combined into the newly activated one.
Point 2, objects that reach a certain age will be tenured into old space. Objects that won't fit into a either a Survivor space or Eden will be created or copied directly into old space. In the latter case this is known as a premature promotion. The reason why we don't want objects to be promoted prematurely is that they are very very likely to die very quickly and removing them from old requires a mark, sweep, and compaction (read in place copy just like disk defragmentation). In fact we don't want old to fill up at all because that will result in more full GCs and full GCs are always very expensive in regards to pause time. More over, as old fills, it becomes more difficult for it to meet its young generation guarantee. If your head is spinning from the details don't despair, things do get easier from here on in.
What the young generation guarantee states is that there should always be enough space in old to accommodate all survivors from young. Executive summary, if I don't believe that there is enough free space in old, I must trigger a full GC. Hint, don't let short lived objects "leak" into old space. The stressor is; don't keep long lived objects in young space. The dilemma is, that which keeps short lived objects in young will also keep long lived objects out of old. Did I say this was going to get easier?
The parameters that you want to be looking at that will affect how long an object can stay in young are; survivor ratio and tenuring threshold.
Survivor ratio is simply a value the tells the JVM how to partition young into old and survivor spaces. If the survivor spaces are too small, objects will be prematurely promoted into old to make way for newer objects. If they are too big, young generation will have to be GC'ed more often resulting in objects aging faster than normal which results in them being prematurely tenured. Even worse, object creation maybe forced into old space where it is much more expensive. Again, balance is the key.
So, if we are to make an informed decision we must have a measurement. In this case the measurement to take is look at the age distributions of your objects in the survivor spaces. The default threshold for Sun is 31. If you have objects leaking into old and the age distribution doesn't include old objects the conclusion can only be, you are experiencing premature promotion. If you are experiencing many full GCs yet the age distribution looks normal, then it maybe a case were you need to increase the tenuring threshold in an attempt to capture these objects in young space.
What every GC problems I've faced, I've found that I've rarely needed to set anything other than max memory and the survivor ratio. I've also learned that some problems disguise themselves as GC problems. These problem won't go away by tuning the GC. Though you maybe able to mask or hide them for some period of time, they will manifest themselves (often viciously after having been suppressed for so long) sooner or later unless you address the underlying fault. One example of this could be ineffective use of thread pooling that allows the system to runaway leading to high rates of object creation that are beyond the collectors ability to keep up.
The best lesson I've learned is that you only have to be a rocket scientist to create this stuff, you don't have to be one to use it. What I'm trying to say is that if you don't know what is going on and you're unsure of what to do, instead of guessing, try poking around for a bit to get a better measure because I've seen where a proper measure will explain all to even the most technically inept. I also know that a bad guess can cripple GC ergonomics.
Last point, don't forget to measure for effect and the user experience rules all even when the means defy what any "expert" tells you.
Good point! Last time I tried to tune and benchmark several JVMs, pretty
much every setting I changed ended up decreased the performance :-) That
said, I do like to set a maximum heap size (especially if the machine isn't
dedicated to the application), and sometimes you need to override the
maximum perm gen size (e.g. for Tomcat if you need to redeploy a lot
without restarting). Also I think it makes sense to tell the JVM if you
want to optimize for overall throughput or minimize delays, a fundamental
trade-off (JRockit: -Xgcprio:throughput vs pausetime).
Thanks Eric, there is actually a lot more to say but I ran out of time.
Witness the poor proofing. Good catch on Tomcat and the classloading/perm
space problem. That one is a constant source of confusion for those that
face it. I know it tripped me up the first time I ran into it.
Good overview of how collectors work and the tradeoffs they have to face.
I think the "measure, don't guess" matra is a very good one, with one
caveat - make sure you really measure the behavior you target for the real
world, and not just cover up the behavior long enough to pass a relatively
short test. I know that you, Kirk, actually spend the time to watch and
verify the longevity and stability qualities of the systems you tune, but
most people don't have the tools.
Gil, thanks for the great comments and get insight gained with you
experience building Azul hardware. I always recommend that people test for
as long as they plan to run their application. If the the application is
24/7/365, then of course you will want to make an exception. However you
won't find slow leaks or slow leaks into perm or other patterns that can
destablize your application.
Neil from Dallas says hi, and I am coming to Hungary to visit you this
summer. You have been forewarned.
It was you Kirk that came up with the "Measure, don't guess" mantra. I've
never been that pithy,