This is a guest post by Chris Schanck, Senior Software Engineer.
Ehcache has a tier model where it uses different types of storage (heap, offheap, disk, clustered). Ehcache will smartly move copies of ‘hot’ data into the different tiers as needed to provide the best performance for the “hottest” data.
What is Ehcache Offheap?
Offheap is a term for Terracotta’s caching tier which stores Java objects in memory that is not managed by the Java garbage collector (GC). To understand why this is useful and important, we need to have a short discussion about Java and memory.
Java is a managed memory language, like Python, Ruby, etc. Java keeps tabs on objects which are created and used in the course of program execution. Eventually, some objects become unreferenced and those objects can be destroyed, freeing up memory for other tasks. This is done automatically through the JVM, without the developer having to worry about it. Magic! But that magic does come at a price. The JVM spends considerable time keeping track of these objects and their usage, and that overhead can slow a program down, especially if you are dealing with large amounts of data. If the JVM falls too far behind, it will initiate a full GC, which can lead to a pause of the entire program for several seconds. This problem can start showing itself with relatively small heaps, around 1 gigabyte or so, and becomes exasperated with larger data sizes. So if you want to have many gigabytes of data in a cache, normal Java “on heap” memory will not work well.
Java also has provisions for allocating chunks of raw unmanaged memory, which many refer to as Offheap memory. Offheap memory must be allocated by the programer and freed by the garbage collector. The advantage of Offheap is that you allocate large chunks of such memory and use it piecemeal; the GC sees each Offheap buffer as one monolithic object, making it easier to manage.
If you can allocate a 20 gigabyte of Offheap buffer and fill it with several thousand smaller objects, you can relieve the GC from having to worry about them – allowing you to manipulate large amounts of data. Win-win!
Ehcache with Offheap storage is a caching tier implementation that stores Java objects in Offheap memory space. This allows for a single cache to store very large amounts of data indeed – think hundreds of gigabytes, even multiple terabytes in a single JVM.
No special coding needs be learned,
No new API needs to be mastered.
You enable it in your Ehcache config, specify a maximum size, and use your cache like always. Just faster and bigger.
Normal Ehcache heap configuration is done via the ehcache.xml configuration. There are two cache level directives that are relevant for configuring Offheap use, although most are only interested in
From the docs in ehcache.xml:
Sets the amount of off-heap memory this cache can use, and will reserve.
This setting will set overflowToOffheap to true. This allows us to enable Offheap
Note that it is recommended to set maxEntriesLocalHeap to at least 100 elements when using an off-heap store, otherwise performance will be seriously degraded, and a warning will be logged.(This is the minimum advised size for heap to be used in conjunction with offheap)
The minimum amount of offheap that can be allocated is 128MB. There is no maximum.
<cache>... </cache> stanza in your ehcache.xml, add
<maxBytesLocalOffheap=”4g”> to reserve and use 4 gigabytes of offheap memory.
The overflowToOffheap setting is set to true automatically when you set
maxBytesLocalOffheap, so you can safely ignore this one.
A sample entry for a cache to use 4 gigabytes of offheap looks like this:
That’s almost it — there is one thing left to do. You have to tell the JVM that you need 4 gigabytes of offheap memory available. To do this you have to pass the JVM a special switch. This switch might vary if you are on non-standard Java implementations, but most people will use the standard JVMs, so the configuration is simple. Simply pass the
-XX:MaxDirectMemorySize flag to the JVM on startup with a sufficient amount of memory. For our 4G cache above, we might pass in:
to allow a little breathing room.
That’s it — you now can use your cache to store hundreds of millions or even billions of entries, at in-memory speed, with no change to your code. You just use Ehcache exactly as is. Snap offheap in, scale Ehcache out.
What to Expect
Offheap memory access is generally very slightly slower than heap object access due to serialization, but it is orders of magnitude faster than disk or network accesses. Additionally, in the Ehcache architecture Offheap is one tier in the cache — you also configure a heap cache tier in front of the offheap to hold frequently accessed objects.
Remember, caching is what Ehcache does best, so this tiered model is as you might expect, super efficient.
When using a tiered Ehcache with offheap, you can expect extremely fast access to your data, even with hundreds of millions of entries in a cache. Furthermore, the access will be extremely predictable — no big spikes in access latency.
Using the newly open-sourced Offheap for Ehcache has some big advantages beyond the technical ones we’ve discussed. For example, literally trillions of cache put() and get() operations have been run through the Ehcache Offheap implementation by commercial customers. Customers with money and time on the line who expect that Ehcache with offheap will deliver predictable, fast, low-overhead results.
With so many large commercial installations in active use, you can imagine that the Offheap code is very well tested for its intended purpose, and is highly optimized for use in Ehcache.