Abstract:
Web applications running in a clustered environment cannot perform to the level of their stand-alone counterparts due to session replication. Session replication is extra work that must be completed before the response can be returned to the client. Terracotta Sessions is a specialized framework that clusters the HTTPSession object. This paper will explore the session replication efficiency of Terracotta Sessions versus a leading commercial application server.
Introduction
The Hyper Text Transfer Protocol (HTTP) is the backbone of Web applications. It enables your browser to interact with a Web server. As powerful as this architecture is, the model is stateless. In other words, the server will have no memory of any interactions with any client on each subsequent request unless the server takes special steps to retain this data. In Java this data is collected into a special object known as the HTTPSession. HTTP has the ability to offer credentials to the Web server. The Web server will use those credentials to connect the request to the appropriate HTTPSession object. This allows down stream processing to take advantage of information that has been previously collected. The most common example of this is the shopping cart found on many Web- based stores. The contents of your shopping cart are only there because they have been saved in an HTTPSession object and the Web server is able to get your cart because it can identify you.
The potential downside to this scheme is that the contents of your shopping cart could be lost should the server become inoperable. To combat this problem (and others), systems are engineered so that session state is more durable. This durability is often achieved by replicating the state information to other servers cooperating together in a cluster. To replicate state in a Servlet, one must rebind the objects to the HTTPSession object. Once bound, the Servlet engine will serialize the data and push it out onto the network to every other engine participating in the cluster. Once the replication is complete, the Servlet engine will finish processing the client request and return the resulting HTML page.
The mathematics used to explain the effects of session replication on performance are surprisingly simple. Increase the time it takes to service a request and you will see an inversely proportional decrease in throughput in the system. Consider the case where a request takes 10ms to fulfill. From this we can calculate the function rate for the system at 100 requests per second. If we needed to double the throughput to 200 requests per second, we would need to service requests in 5ms. Real life will introduce many influencing factors and complications but it will not free the server from this fundamental principle. The lesson is that replicating session state will increase the time it takes to service a request which will increase response time while reducing throughput.
These side effects force developers to plan for session management and replication in the initial design phase if they want to build highly scalable Web applications. One question remains; how can we ensure that our applications’ scalability will not be hindered by session management and replication? The architectural tool of choice used to answer these questions is the benchmark. Properly structured, a benchmark will help you understand the outer limits of your technology choices. Understanding these limits goes a long way to help you understand how your system will behave under load. The goal of this document is to use one such benchmark to demonstrate the difference in session replication performance between Terracotta Sessions and a leading commercial application server.
The Benchmark
Given the same session states and the same pattern of utilization and change in session state, this benchmark is designed to answer which of the two techniques will allow the server to function at the highest throughput. Since the benchmark should bottleneck on the network, that question may be who will make more effective use of the available network bandwidth?
The test Servlet shown in the listing below was running in a leading commercial application server throughout the entire exercise.
protected void doGet(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException {
// collect configuration information fomr the URL
response.setContentType( "text/html");
PrintWriter out = response.getWriter();
String manage = request.getParameter( "manage");
String userId = request.getParameter( "userid");
HttpSession session = request.getSession();
SessionObject sessionObject = null;
String parameter = request.getParameter( "size");
if ( parameter == null) return;
int size = Integer.parseInt( parameter);
parameter = request.getParameter( "replace");
if ( parameter == null) return;
int replace = Integer.parseInt( parameter);
//create and bind in a session object if the client is new.
if ( session.isNew()) {
sessionObject = new SessionObject( size, replace);
session.setAttribute( SessionObject.SESSION_KEY, sessionObject);
} else {
sessionObject = (SessionObject)session.getAttribute(
SessionObject.SESSION_KEY);
sessionObject.replaceRegion();
// Do we need to rebind the session state?
if ( "normal".equals( manage))
session.setAttribute( SessionObject.SESSION_KEY, sessionObject);
}
// format the response to the client.
this.write( out, userId, sessionObject);
}
The Servlet code shown above forms the basis of the benchmark. The normal flow is to;
The exact flow through the code is controlled by parameters that are encoded into the HTTP request. One such parameter tells the server if the object retaining session state should be rebound into the HTTPSession object. This capability was included so performance products (such as Terracotta Sessions) that do not require this action to trigger session replication could be measured.
The HTML page returned to the client contains the current value of the number of server requests. Each client should always see their own monotonically increasing integer value. Failure to see this behavior is an indication that session replication is not functioning as expected.
The session object is a wrapper over a series of byte arrays contained in an ArrayList. This choice of implementation was made for two reasons. First it is the simple data structure that should not place any great demands on serialization. Using a more complex object schema may offer advantages to one implementation over another. That said, those advantages would be determined by the shape of the object schema. To determine if such an advantage exists goes well beyond the scope of this study and as such using the simplest structure is more relevant. The second consideration is that calculating the size of the session object and controlling the percentage changed is much easier when using byte arrays.
Testing Methodology
The tests executed a series of runs that utilized a fixed session state size while varying the number of bytes replaced. The series of session state sizes tested were 4096, 7168, 10240, and 20480 bytes respectively. The number of bytes replaced in each series was 32 and 2048 respectively. Each combination of session state size and number of bytes replaced was run three times to ensure that the values obtained were stable. In addition to these tests, a more thorough study of the 10240 byte session state size was conducted. In that study, 32, 64, 128, 256, 512, 1024, 2048 bytes were swapped out in successive runs. All of these tests were run using both Terracotta Sessions and the other replication services. The ratios for all of the tests are reported in Table 1.
| 2048 | 1024 | 512 | 256 | 128 | 64 | 32 | |
| 20480 | 2.7712438 | X | X | X | X | X | 8.946691 |
| 10240 | 1.567595096 | 2.604683 | 3.773979 | 4.213513 | 4.1797102 | 4.287619 | 4.268012 |
| 7168 | 1.070885936 | X | X | X | X | X | 3.05277 |
| 4096 | 0.830942967 | X | X | X | X | X | 2.0355 |
The column on the left is the session size. The row across the top indicates how many bytes where replaced on each call. The values represent the throughput when using Terracotta Sessions divided by the throughput achieved when using the other session replication.
Each individual test was started by selecting a system configuration (as described above), a session state object size and the number of bytes to swap in each successive call. The next step was to tune the test harness so that an appropriate load could be placed on the server. In these tests the ideal load produced the largest throughput for the slowest implementation at a given session and replacement size. This required several runs in both configurations to determine the optimal throughput for each. It was from these settings that measurements were derived. This technique was chosen so that all throughput measurements would be taken when the system was under a fixed load. The implication of this technique is that larger values for throughput are possible for each configuration tested. For example, when the system was throttled up to allow Terracotta Sessions to reach its peak throughput, it achieved a ratio of 4.89 when using the 10240/32 byte configuration. This is over 14% higher than the throughput ratio of 4.268 reported in table 1.
In addition to triggering requests, the client harness was responsible for counting all of the successful HTTP requests that the system was able to complete. The test was repeated three times to ensure that the results remained stable. All software components were restarted between every single test in order to eliminate any interference between tests. Any failures resulted in the test being shutdown as this would have the effect of reducing load on the server. Once the reason for the failure had been determined, an appropriate fix was implemented and the testing continued. It should be noted that this action was only necessary during the configuration stage of the exercise.
The test harness functioned by starting up a number of threads. Each thread would control or simulate several clients. The simulation loop started by picking a client and then a server in which to make the request against. Once selections had been made, the URL was formed and the HTTPRequest was triggered.
There were two parameters that could tune in the harness. The first was the number of threads making a request. Depending upon where the performance limits were, the harness used between 50 and 150 threads to maintain load on the server. The second parameter was the number of clients each thread was driving. The results of all queries were logged for the purpose of detecting utilization of session state that had not been properly or completely replicated.
Discussion of Results The nature of this benchmark is to illustrate the bottleneck on the network. Thus any results derived from this benchmark will reflect the efficient (or inefficient) use of the network. It should also be expected that as the amount of change in the session approaches the size of the session that the differences in efficiency will turn to other factors which may or may not expose themselves as the overwhelming constraint will be network utilization.
In Figure 1, Terracotta Sessions is able to keep almost a flat performance profile regardless of the overall size of the session. This performance profile is a consequence of Terracotta Sessions’ granular replication of only that data which has changed. Since the other application server is forced to replicate the entire session, its performance profile is significantly degraded. On the other end of the scale, Terracotta Sessions is able to manage marginally better performance over the other application server. This reduction is a direct consequence of the 64x increase in the amount of data being replicated. Even under this load, it is only the overhead of tracking changes in a larger volume of data that causes Terracotta Sessions to converge with the other application server.

Another mitigating factor on the high end of the scale was the effect of garbage collection. Strangely enough, the cost of collecting garbage is determined by how much is left behind more then how much is collected. A full discussion of the effect is beyond the scope of this paper but it will suffice to say that stop-the-world garbage collection does factor into this set of results. On the lighter end of the scale we see that the other application server actually out-performs Terracotta Sessions. This result is quite typical of frameworks and applications that have been tuned to handle heavy loads.
Figure 2 demonstrates the ratio of Terracotta Sessions throughput vs. the other application server. These values were collected from the series of benchmarks in which the total session size was kept constant and the number of byte changes ran between 32, 64, 128, 256, 512, 1024, and 2048 as illustrated in Table 1.

In this test the downward sloping graph is a measure of the difference in the amount of data that is being transmitted across the network by Terracotta and the other application server. Once again this result is not unexpected as Terracotta will consume more of the network as the size of the change in the session state increases. However it is clear that as the size of the session grows and the size of the change gets smaller, the difference in performance gets larger with Terracotta coming out on top. This effect is further illustrated in Figure 3.

Figure 3 demonstrates raw throughput for each value for number of bytes changed in the range mentioned above. The convergence in performance of the two products is expected as the benchmark becomes bottlenecked on the network and garbage collection starts to interfere.
There was one other interesting observation to emerge from this exercise. During the setup for the benchmarks, a number of test runs were made in order to determine the levels to which the client harness should be set. This test phase started with using Terracotta Sessions. All of these tests ran to completion without issue. However the equivalent runs using the other application server resulted in the JVM running out of memory. With further testing it was determined that the other session replication required 768Mbytes of Java heap in order to run. This value was used throughout all of the tests (Terracotta Sessions included). It is quite likely that this larger then necessary memory space contributed to the increased garbage collection times during the Terracotta Session test runs.
Conclusion
What has been demonstrated here is the effectiveness of using Terracotta Sessions to replicate HTTP sessions. The main advantage of Terracotta Sessions is in how it manages to replicate data by reducing its utilization of expensive networking resources. By transmitting less data than a traditional session replication strategy does, Terracotta Sessions can release HTTP clients sooner rather than later. In addition to lessening the impact of session replication on users’ response times, it allows the server to free up resources sooner which has the trickle on effect of improving throughput. In addition Terracotta Sessions has the ability to replicate session state even when it hasn’t been rebound into the HTTPSession object. And unlike tradition replication schemes, Terracotta Sessions will replicate data that is not serializable.
The fact that the other application server was more resource intensive with Terracotta Sessions is evidenced by the higher levels of throughput that was achieved than could be reported on in Table 1. For example, a run configured for a 10240 byte session size with a change of 32 bytes running at a higher load resulted in a ration of 4.79 as opposed to 4.27 achieved when the run configurations were set by the other’s limitations. Simply stated, this benchmark clearly demonstrates Terracotta Sessions ability to cluster your Web applications more efficiently that a leading commercial application server session replication.
leading commercial application server, the test results would be
more convincing if You could include more application servers (with names)
in the test. A leading open-source application server would have been my
selection. Btw. The checkout link is broken.