Pure Danger Tech


navigation
home

Distributed ehcache with Terracotta

17 Jun 2008

Terracotta provides pre-built integration support for many popular open source libraries and frameworks. One of the most popular supported libraries is the excellent Ehcache caching library.

We’re going to follow these steps:

  1. Write a simple Ehcache example
  2. Download Terracotta
  3. Download the Ehcache integration module
  4. Write some Terracotta config
  5. Run in a cluster

Write some code

Let’s pretend that in our application we are managing a music collection. We’ll further assume that we’re building an app to manage not just our own music collection but the collections of many people in some sort of magical web 2.0 shared application. In this case, our application servers will need to hold lots of album data in memory at any given time. Many people will have the same popular albums so it makes sense to stick these albums in an application cache, rather than load them from a data source every time.

We’ll need some kind of Album class: [source:java]

package albums;

public class Album implements Serializable {

private int id;

private String title;

private String artist;

private int year;</p>

// all the obvious constructors, getters,

// equals, hashCode, toString(), etc

}

[/source]

We then need to create a front-end to our cache (hiding the use of Ehcache from the users of our cache). Let’s assume all we need is the ability to add and find albums. Note that our application has assigned a unique identifier to each album added to the system so we can use that for identification purposes. [source:java]

package albums;

public class AlbumCache {

private CacheManager manager = new CacheManager();

private Cache getCache() {

return manager.getCache(“albumCache”);

}

public void addAlbum(Album album) {

Cache albumCache = getCache();

albumCache.put(new Element(album.getId(), album));

}

public Album findAlbum(int id) {

Cache albumCache = getCache();

Element element = albumCache.get(id);

if (element != null) {

return (Album) element.getValue();

} else {

return null;

}

}

}

[/source]

AlbumCache doesn’t really do much – we have defined an Ehcache CacheManager and the getCache() method looks up our particular cache. There are several ways to create CacheManagers and here we are just using a very simple one that loads the cache configuration from a resource named ehcache.xml in the classpath. More on that in a moment. We then just have simple methods that add an album (keyed by album ID) to the cache and look up an album in the cache by ID (if it exists).

Next up we need to look at the ehcache.xml file:

<ehcache 
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
	xsi:noNamespaceSchemaLocation="ehcache.xsd"> 

	<cache 	name="albumCache" 
			maxElementsInMemory="10000" 
			memoryStoreEvictionPolicy="LFU" 
			eternal="false" 
			timeToIdleSeconds="300" 
			timeToLiveSeconds="600" 
			overflowToDisk="false" 
			diskExpiryThreadIntervalSeconds="120" /> 

	<defaultCache 
			maxElementsInMemory="10000" 
			memoryStoreEvictionPolicy="LRU" 
			eternal="false" 
			timeToIdleSeconds="120" 
			timeToLiveSeconds="120" 
			overflowToDisk="true" 
			diskSpoolBufferSizeMB="30" 
			maxElementsOnDisk="10000000" 
			diskPersistent="false" 
			diskExpiryThreadIntervalSeconds="120" /> 
			
</ehcache>

Every Ehcache configuration file requires a defaultCache definition although we aren’t using it in our application. We’ll be using the first named cache “albumCache”. Most of this configuration should be straightforward – the cache allows 10000 items at most and does not overflow to disk. Items in the cache are evicted based on a “Least Frequently Used” algorithm when the cache is full and expire when unused for 300 seconds or at most 600 seconds.

We can then write a simple driver program to load and retrieve some data from cache:

[source:java]

public class Main {

public static void main(String arg[]) throws Exception {

AlbumCache cache = new AlbumCache();

findAlbums(cache);

loadAlbums(cache);

findAlbums(cache);

}

private static void loadAlbums(AlbumCache cache) {

System.out.println();

System.out.println(“Adding albums.”);

cache.addAlbum(new Album(0, “Moving Pictures”, “Rush”, 1981));

cache.addAlbum(new Album(1, “What’s Going On”, “Marvin Gaye”, 1971));

cache.addAlbum(new Album(2, “The White Album”, “The Beatles”, 1968));

cache.addAlbum(new Album(3, “The Dark Side of the Moon”, “Pink Floyd”, 1973));

}

private static void findAlbums(AlbumCache cache) {

System.out.println();

System.out.println(“Finding albums:”);

for(int i=0; i<4; i++) { System.out.println(cache.findAlbum(i)); } } } [/source] This simple program loads some classic albums and then finds them again in the cache by ID. To run this program, you’ll need the ehcache jar (we’ll use 1.3.0) and the commons-logging jar. Execution may look something like this:

$ java -cp bin:lib/ehcache-1.3.0.jar:lib/commons-logging-1.1.1.jar albums.Main

Finding albums:
null
null
null
null

Adding albums.

Finding albums:
Moving Pictures by Rush (1981)
What’s Going On by Marvin Gaye (1971)
The White Album by The Beatles (1968)
The Dark Side of the Moon by Pink Floyd (1973)

But what happens when our music collection application hits the front page of digg? We gotta scale up. That of course is a whole slew of topics in itself, but let’s look at how we can scale our Ehcache instance with Terracotta. Terracotta allows us to mark portions of our Java heap as shared and those portions then become visible to every node in the Terracotta cluster. Not only that, but the clustered data can be modified and the changes will become visible in every node in the cluster as well.

Download Terracotta

First, we’ll download the latest release of Terracotta (currently 2.6.1). You can grab the latest release from the Terracotta download page. You’ll either want to download the Windows installer and run it or download the generic tar file and expand it. In either case, I’ll assume the root of your Terracotta installation is at .

Download Ehcache integration

Terracotta provides a variety of integrations with other projects. The integration support typically takes the form of a Terracotta Integration Module (TIM), which is just a JAR file that loads into Terracotta as a bundle of configuration information.

Some integration modules are included in the Terracotta kit (in the /modules directory) and some are on the [Forge](http://forge.terracotta.org). In general, we are slowly moving most of the TIMs to the forge, as this allows us to update these for bug fixes and enhancements independent of the main kit release.

Since the 2.6.0 release, the Ehcache TIM is now found on the Forge and does not come included in the main Terracotta kit. To see all the available versions of the Ehcache TIM, you can go to the Forge Catalog page or see the Ehcache TIM web site for the most recent release. In either case, you’ll find the binary download will be a file like tim-ehcache-1.1.1-bin.zip.

When you download the tim-ehcache archive zip, you’ll actually find that it contains some jar files. You should unzip the zip into your **/modules** directory to install the integration modules into the Terracotta kit.

Configure Terracotta

We’re now ready to configure our application to run with Terracotta and clustered Ehcache. Terracotta works by supplying external configuration and modifying the java startup process, rather than by you calling an API. Thus, we will leave our application unchanged.

Instead we must create a tc-config.xml configuration file, which defines what is clustered. In this program, we need to indicate that the Album class will be in the clustered heap and must be instrumented. Also, we must mark the AlbumCache.manager field as being a clustered root. In Terracotta, roots indicate a starting point for clustered state. All object references from the root will be clustered (unless you use transient to mark fields that should not be clustered).

Also, we must include the ehcache-1.3 integration module which will pull in the configuration defined in the TIM we downloaded.

<?xml version="1.0" encoding="UTF-8"?>
<con:tc-config xmlns:con="http://www.terracotta.org/config">
  <servers>
    <server host="your.ip.goes.here" name="localhost">
      <dso-port>9510</dso-port>
      <jmx-port>9520</jmx-port>
      <data>terracotta/server-data</data>
      <logs>terracotta/server-logs</logs>
      <statistics>terracotta/cluster-statistics</statistics>
    </server>
    <update-check>
      <enabled>true</enabled>
    </update-check>
  </servers>
  <clients>
    <logs>terracotta/client-logs</logs>
    <statistics>terracotta/client-statistics/%D</statistics>
    <modules>
      <module group-id="org.terracotta.modules" name="tim-ehcache-1.3" version="1.1.1"/>
    </modules>
  </clients>
  <application>
    <dso>
      <instrumented-classes>
        <include>
          <class-expression>albums.Album</class-expression>
        </include>
      </instrumented-classes>
      <roots>
        <root>
          <field-name>albums.AlbumCache.manager</field-name>
        </root>
      </roots>
    </dso>
  </application>
</con:tc-config>

We also need to add one additional jar to our classpath – this jar is currently required by the Ehcache TIM (this dependency should be removed in the future). The jar to include is the JSR 107 (JCache) api jar. This api can optionally be used by Ehcache and Terracotta provides support for it. You can download the project from the JSR 107 SourceForge project.

Run it!

Before we run our program again with Terracotta, we need to first start a Terracotta server. There are a variety of ways you can configure the Terracotta server for high availability, persistence, etc. Here we’ll just start a single server with a script provided in /bin:

$ start-tc-server.sh

The easiest way to start the application is to use the dso-java.sh script. This script just wraps the java command indicated by your JAVA_HOME environment variable to set a few properties and prepend the dynamically generated Terracotta boot jar to the bootclasspath.

$ dso-java.sh \
-cp bin:lib/ehcache-1.3.0.jar:lib/commons-logging-1.1.1.jar:lib/jsr107cache-1.0.jar \
albums.Main

Finding albums:
null
null
null
null

Adding albums.

Finding albums:
Moving Pictures by Rush (1981)
What's Going On by Marvin Gaye (1971)
The White Album by The Beatles (1968)
The Dark Side of the Moon by Pink Floyd (1973)

Here we see that we get the same results as when running it without Terracotta. But if we run the program again, you’ll see that the first find section will actually find all albums. The state in the album cache has been clustered and stored in the Terracotta server.

If you start up multiple JVMs in this manner, they will all be sharing the identical clustered version of Ehcache. When an item is added to the cache, it’s seen by every JVM in the cluster.

How does it work?

You might be wondering how you can configure the Terracotta version of Ehcache. For this, we’ll need to look back at the Ehcache configuration file. Terracotta integrates with Ehcache by providing our own custom memory store. You can configure that memory store using the same properties as before: timeToLiveSeconds, timeToIdleSeconds, and diskExpiryThreadIntervalSeconds.

These properties control a time-based eviction policy used in place of the normal LRU/LFU/FIFO eviction policies. This time-based store is designed to maximize concurrency and minimize the need to fault objects into a cluster node for eviction. One key point is that the Terracotta distributed Ehcache is coherent – all nodes are seeing the same picture of the cache.

Resources

Here’s some useful links if you want to explore further:

You can find a lot more information about caching and Terracotta in The Definitive Guide to Terracotta from Apress, which should be available next week!